Job Title: Lead ML Ops Engineer
Work Location: Jersey City NJ (Hybrid)
Type: Contract
About the Role
As a Lead MLOps Engineer you will own the infrastructure and operationalization of machine learning and Generative AI systems at scale. This includes building reliable pipelines for training evaluation deployment and monitoring of models in production environments.
What Youll Do
- ML/LLMOps Architecture: Design end-to-end pipelines for training and deploying models including LLM fine-tuning and RAG architectures.
- Deployment & Scaling: Build scalable services using Docker Kubernetes and cloud-native tools (AWS/GCP/Azure).
- Pipeline Automation: Develop CI/CD pipelines for ML workflows (GitHub Actions Jenkins) including automated testing and validation of models.
- Monitoring & Observability: Implement monitoring for model performance latency drift and cost (Prometheus Grafana custom metrics).
- Data & Feature Pipelines: Integrate with data platforms (Spark Kafka Airflow) and feature stores.
- Cost Optimization: Optimize GPU/compute usage for large-scale model inference and training.
What Youll Bring
- 8 years in software/ML engineering with production systems.
- Strong Python skills and experience building scalable APIs/services.
- Experience with ML platforms (MLflow SageMaker Vertex AI).
- Deep understanding of distributed systems and data pipelines.
- Hands-on experience with LLMs embeddings and vector databases.
Job Title: Lead ML Ops Engineer Work Location: Jersey City NJ (Hybrid) Type: Contract About the Role As a Lead MLOps Engineer you will own the infrastructure and operationalization of machine learning and Generative AI systems at scale. This includes building reliable pipelines for training evaluati...
Job Title: Lead ML Ops Engineer
Work Location: Jersey City NJ (Hybrid)
Type: Contract
About the Role
As a Lead MLOps Engineer you will own the infrastructure and operationalization of machine learning and Generative AI systems at scale. This includes building reliable pipelines for training evaluation deployment and monitoring of models in production environments.
What Youll Do
- ML/LLMOps Architecture: Design end-to-end pipelines for training and deploying models including LLM fine-tuning and RAG architectures.
- Deployment & Scaling: Build scalable services using Docker Kubernetes and cloud-native tools (AWS/GCP/Azure).
- Pipeline Automation: Develop CI/CD pipelines for ML workflows (GitHub Actions Jenkins) including automated testing and validation of models.
- Monitoring & Observability: Implement monitoring for model performance latency drift and cost (Prometheus Grafana custom metrics).
- Data & Feature Pipelines: Integrate with data platforms (Spark Kafka Airflow) and feature stores.
- Cost Optimization: Optimize GPU/compute usage for large-scale model inference and training.
What Youll Bring
- 8 years in software/ML engineering with production systems.
- Strong Python skills and experience building scalable APIs/services.
- Experience with ML platforms (MLflow SageMaker Vertex AI).
- Deep understanding of distributed systems and data pipelines.
- Hands-on experience with LLMs embeddings and vector databases.
View more
View less