Exploring T5 Model Pretraining and MLFlow Integration A Journey in ML Deployment

Dec 26, 2024 · bhaskar

Introduction

Recently, I embarked on an exciting journey to pretrain the T5 model with python pytorch lighting and explore its deployment through WebAssembly (WASM) for experimental project. This experience, combined with discovering MLFlow for experiment tracking, provided valuable insights into modern ML deployment strategies.

T5 Model and WASM Implementation

The attempt to pretrain the T5 model and deploy it via WASM presented an innovative approach to model deployment. While exciting, the process highlighted key challenges:

Data preparation complexities for training and testing
Need for robust preprocessing pipelines
WASM optimization requirements for model performance

MLFlow Integration Experience

MLFlow emerged as a powerful tool for experiment management:

Here is simple tracking wiht MLFlow in pytorch lightning:

from pytorch_lightning.loggers import MLFlowLogger

        mlflow_logger = MLFlowLogger(
            experiment_name="t5_model",
            log_model="all",
            tracking_uri="http://localhost:5000")


        std_logger.info("Configuring trainer...")
        trainer = pl.Trainer(
            logger=mlflow_logger,
            callbacks=callbacks,
            max_epochs=config.max_epochs,
            accelerator='gpu' if config.use_gpu else 'cpu',
            devices=config.n_gpu if config.use_gpu else None,
            precision=config.precision,
            strategy='auto',
            log_every_n_steps=1,
            accumulate_grad_batches=config.gradient_accumulation_steps,
            gradient_clip_val=config.max_grad_norm,
            deterministic=True,
            enable_progress_bar=True,
            enable_model_summary=True,
            profiler="simple"
        )

Key benefits discovered:

Experiment tracking and versioning
Model registry capabilities
Simple deployment workflows
Self-hosting flexibility

Deployment Considerations

The project revealed various hosting options, each with unique advantages:

WASM Deployment
- Client-side execution
- Reduced server costs
- Browser compatibility challenges
MLFlow Self-Hosting
- Complete control over infrastructure
- Custom configuration options
- Cost-effective for small teams

Future Work

Moving forward, focus areas include:

Improving training data quality
Optimizing WASM performance
Exploring hybrid deployment approaches
Enhancing MLFlow integration

This experience demonstrates the evolving landscape of ML deployment, highlighting the importance of choosing the right tools and platforms for specific use cases.

Based on use case, we can use different tracking and deployment strategies:

Platform Type	Service	Key Features	Pricing Model	Best For
Cloud Platforms	AWS SageMaker	- Fully managed ML platform - Built-in monitoring - Auto-scaling - Integrated with AWS	Pay-per-use (compute + storage)	Enterprise ML deployments
	Google Cloud AI	- TensorFlow integration - AutoML features - Global infrastructure - Serverless options	Pay-per-use + resource allocation	TensorFlow workloads
	Azure ML	- Enterprise security - Automated ML - Azure integration - MLOps support	Subscription + consumption	Microsoft ecosystem users
Self-Hosted	Docker	- Platform independent - Custom configurations - Version control - Local testing	Infrastructure costs	Custom model deployment
	Kubernetes	- Container orchestration - Load balancing - High availability - Scalability	Infrastructure + maintenance costs	Large-scale deployments
Specialized	Hugging Face	- Transformer models - Easy deployment - Community support - Model hub	Free tier + premium plans	NLP applications
	MLflow	- Experiment tracking - Model registry - Open source - Deployment tools	Free (open source) + hosting costs	ML lifecycle management

Introduction

T5 Model and WASM Implementation

MLFlow Integration Experience

Deployment Considerations

Future Work

References