Build and scale the infrastructure that powers AI at enterprise scale. Design robust automated systems that enable data scientists and ML engineers to deploy monitor and maintain machine learning models in production environments.
Key Responsibilities:
- Design and implement MLOps pipelines for model training deployment and monitoring
- Build automated CI/CD systems for machine learning model lifecycle management
- Develop infrastructure for real-time and batch ML inference at scale
- Implement model monitoring drift detection and automated retraining systems
- Design data pipelines and feature stores for ML model development and serving
- Collaborate with data science teams to productionize research models
- Optimize ML infrastructure for performance cost and reliability
- Implement security and compliance controls for ML systems and data
Requirements
Bachelors or Masters degree in Computer Science Engineering or related field
4 years experience in DevOps/Infrastructure with 2 years focused on ML systems
Proficiency in container technologies (Docker Kubernetes) and cloud platforms
Experience with ML frameworks (TensorFlow PyTorch Scikit-learn) and MLOps tools
Strong programming skills in Python with knowledge of infrastructure-as-code
Experience with data pipeline tools (Airflow Kafka Spark) and databases
Understanding of ML model serving frameworks and API development
Knowledge of monitoring tools and observability practices for ML systems
Benefits
Compensation Range: $140000 - $250000 plus equity
Bachelor's or Master's degree in Computer Science, Engineering, or related field 4+ years experience in DevOps/Infrastructure with 2+ years focused on ML systems Proficiency in container technologies (Docker, Kubernetes) and cloud platforms Experience with ML frameworks (TensorFlow, PyTorch, Scikit-learn) and MLOps tools Strong programming skills in Python, with knowledge of infrastructure-as-code Experience with data pipeline tools (Airflow, Kafka, Spark) and databases Understanding of ML model serving frameworks and API development Knowledge of monitoring tools and observability practices for ML systems