Bhaskar

Exploring T5 Model Pretraining and MLFlow Integration A Journey in ML Deployment

· bhaskar

Introduction

Recently, I embarked on an exciting journey to pretrain the T5 model with python pytorch lighting and explore its deployment through WebAssembly (WASM) for experimental project. This experience, combined with discovering MLFlow for experiment tracking, provided valuable insights into modern ML deployment strategies.

T5 Model and WASM Implementation

The attempt to pretrain the T5 model and deploy it via WASM presented an innovative approach to model deployment. While exciting, the process highlighted key challenges:

  • Data preparation complexities for training and testing
  • Need for robust preprocessing pipelines
  • WASM optimization requirements for model performance

MLFlow Integration Experience

MLFlow emerged as a powerful tool for experiment management:

Here is simple tracking wiht MLFlow in pytorch lightning:

from pytorch_lightning.loggers import MLFlowLogger

        mlflow_logger = MLFlowLogger(
            experiment_name="t5_model",
            log_model="all",
            tracking_uri="http://localhost:5000")


        std_logger.info("Configuring trainer...")
        trainer = pl.Trainer(
            logger=mlflow_logger,
            callbacks=callbacks,
            max_epochs=config.max_epochs,
            accelerator='gpu' if config.use_gpu else 'cpu',
            devices=config.n_gpu if config.use_gpu else None,
            precision=config.precision,
            strategy='auto',
            log_every_n_steps=1,
            accumulate_grad_batches=config.gradient_accumulation_steps,
            gradient_clip_val=config.max_grad_norm,
            deterministic=True,
            enable_progress_bar=True,
            enable_model_summary=True,
            profiler="simple"
        )

Key benefits discovered:

  • Experiment tracking and versioning
  • Model registry capabilities
  • Simple deployment workflows
  • Self-hosting flexibility

Deployment Considerations

The project revealed various hosting options, each with unique advantages:

  • WASM Deployment

    • Client-side execution
    • Reduced server costs
    • Browser compatibility challenges
  • MLFlow Self-Hosting

    • Complete control over infrastructure
    • Custom configuration options
    • Cost-effective for small teams

Future Work

Moving forward, focus areas include:

  • Improving training data quality
  • Optimizing WASM performance
  • Exploring hybrid deployment approaches
  • Enhancing MLFlow integration

This experience demonstrates the evolving landscape of ML deployment, highlighting the importance of choosing the right tools and platforms for specific use cases.

Based on use case, we can use different tracking and deployment strategies:

Platform TypeServiceKey FeaturesPricing ModelBest For
Cloud PlatformsAWS SageMaker- Fully managed ML platform
- Built-in monitoring
- Auto-scaling
- Integrated with AWS
Pay-per-use (compute + storage)Enterprise ML deployments
Google Cloud AI- TensorFlow integration
- AutoML features
- Global infrastructure
- Serverless options
Pay-per-use + resource allocationTensorFlow workloads
Azure ML- Enterprise security
- Automated ML
- Azure integration
- MLOps support
Subscription + consumptionMicrosoft ecosystem users
Self-HostedDocker- Platform independent
- Custom configurations
- Version control
- Local testing
Infrastructure costsCustom model deployment
Kubernetes- Container orchestration
- Load balancing
- High availability
- Scalability
Infrastructure + maintenance costsLarge-scale deployments
SpecializedHugging Face- Transformer models
- Easy deployment
- Community support
- Model hub
Free tier + premium plansNLP applications
MLflow- Experiment tracking
- Model registry
- Open source
- Deployment tools
Free (open source) + hosting costsML lifecycle management

References