2. What is MLOps?
Answer
MLOps is the practice of applying DevOps principles to Machine Learning systems.
It covers:
Data Management
Model Development
Model Versioning
Deployment
Monitoring
Retraining
Lifecycle
Data Collection
↓
Data Validation
↓
Feature Engineering
↓
Model Training
↓
Model Validation
↓
Deployment
↓
Monitoring
↓
Retraining
3. Difference between DevOps and MLOps?
DevOps MLOps
Focuses on application code Focuses on data + model + code
CI/CD CI/CD/CT
Version code Version code + data + models
Functional testing Model testing
Performance monitoring Model drift monitoring
4. What is CI/CD/CT in MLOps?
CI
Continuous Integration
Code Commit
↓
Unit Tests
↓
Build
CD
Continuous Delivery
Build
↓
Deploy
CT
Continuous Training
New Data
↓
Retrain Model
↓
Validate
↓
Deploy
5. How do you version ML models?
Tools
MLflow
DVC
S3
Git
Example:
import mlflow
mlflow.sklearn.log_model(model,"customer_churn")
Version:
v1
v2
v3
6. Explain MLflow
Components
Tracking
Projects
Models
Registry
Example
with mlflow.start_run():
mlflow.log_param("lr",0.01)
mlflow.log_metric("accuracy",0.95)
Interview Follow-up:
Why MLflow?
Answer:
Track experiments, compare runs, register models, and manage deployments.
7. What is Data Drift?
Answer
Input data distribution changes over time.
Example:
Training:
Age: 20-40
Production:
Age: 50-80
Model performance drops.
8. What is Concept Drift?
Answer
Relationship between features and target changes.
Example:
Before Covid:
Online spending low
After Covid:
Online spending high
Same inputs but different outcomes.
9. How do you detect drift?
Methods
PSI
Population Stability Index
KL Divergence
Wasserstein Distance
KS Test
Example:
from scipy.stats import ks_2samp
ks_2samp(train_data,prod_data)
10. How do you monitor models?
Metrics
Business Metrics
Revenue
Conversion
CTR
Model Metrics
Accuracy
Precision
Recall
F1
System Metrics
CPU
Memory
Latency
Throughput
Tools:
Prometheus
Grafana
ELK
11. Explain Model Retraining Pipeline
New Data
↓
Validation
↓
Feature Engineering
↓
Training
↓
Evaluation
↓
Deployment
Trigger:
Weekly
Monthly
Drift detection
12. What is Feature Store?
Answer
Central repository for ML features.
Benefits:
Reuse features
Consistency
Online serving
Offline training
Tools:
Feast
Tecton
13. Explain Docker in MLOps
Dockerfile
FROM python:3.11
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python","app.py"]
Benefits:
Portability
Reproducibility
14. Difference between Docker and Kubernetes?
Docker Kubernetes
Containerization Orchestration
Single container Multiple containers
Packaging Scaling
15. How do you deploy ML models on Kubernetes?
Steps
Build Docker Image
↓
Push to Registry
↓
Create Deployment
↓
Create Service
↓
Expose API
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: model
spec:
replicas: 3
16. What is Canary Deployment?
Answer
Deploy new model to small percentage of users.
90% → Old Model
10% → New Model
If successful:
100% New Model
17. Blue-Green Deployment?
Answer
Blue = Production
Green = New Version
Switch traffic instantly.
Benefits:
Zero downtime
Easy rollback
18. How would you deploy a model with zero downtime?
Answer:
Kubernetes Rolling Update
Blue-Green Deployment
Canary Deployment
19. How do you handle large datasets?
Techniques
Spark
Partitioning
Parallel Processing
Example:
df.repartition(100)
20. What if training data is 1 TB?
Answer
Never load into memory.
Use:
Spark
Batch Processing
Distributed Training
21. What if model training takes 12 hours?
Answer
Options:
Distributed Training
GPU
Hyperparameter Optimization
Incremental Learning
22. Explain Kubernetes HPA
Horizontal Pod Autoscaler
CPU > 70%
Scale:
3 Pods → 10 Pods
Example:
kubectl autoscale deployment model
23. What happens if a pod crashes?
Answer
Kubernetes automatically recreates it.
Controller:
ReplicaSet
maintains desired state.
24. How do you secure ML APIs?
Methods
Authentication
JWT
OAuth
Encryption
HTTPS
TLS
Secrets
Kubernetes Secrets
AWS Secrets Manager
25. Explain FastAPI deployment
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def predict():
return {"prediction":1}
Run:
uvicorn app:app
26. What is Model Explainability?
Techniques
SHAP
LIME
Feature Importance
Example:
import shap
Shows why prediction happened.
27. Scenario: Accuracy dropped from 95% to 70%
Approach
Check:
Data Drift
Concept Drift
Data Quality
Pipeline Failures
Feature Changes
Then:
Retrain
Validate
Redeploy
28. Scenario: Prediction API latency increased
Investigate
CPU
Memory
Network
Database
Model Size
Optimization:
Caching
Autoscaling
Quantization
GPU inference
29. Scenario: Production model gives different results than training
Root Causes
Feature mismatch
Data preprocessing mismatch
Version mismatch
Missing transformations
Solution:
Use same pipeline object.
30. Design an End-to-End MLOps Architecture
Data Sources
↓
Kafka
↓
Spark
↓
Feature Store
↓
Training Pipeline
↓
MLflow
↓
Model Registry
↓
Docker
↓
Kubernetes
↓
FastAPI
↓
Prometheus/Grafana
↓
Retraining Pipeline
Advanced EPAM Follow-up Questions
Why use Kubernetes instead of ECS?
Multi-cloud support
Better ecosystem
Advanced autoscaling
Service mesh support
Why MLflow over DVC?
Experiment tracking
Model registry
Deployment integration
How