Job Title: MLOPS PLATFORM ENGINEER
Location: RestonVA
Duration: 12 Months
Visa: USC GC And H1B
Contract Type: W2
Description:
- The Data Modeling Analytics & AI Engineering team is seeking an experienced MLOps Platform Engineer to design build and support enterprise-grade machine learning operations capabilities. This role will play a key part in enabling scalable reliable and secure ML model development and deployment across our cloud and container platforms.
- This is a hands-on engineering role requiring strong expertise in AWS Kubernetes (EKS) CI/CD automation containerization and ML platform operations. The ideal candidate will have solid engineering fundamentals combined with practical knowledge of ML workflows deployment patterns and platform reliability.
Key Responsibilities
Platform Engineering & Operations
- Engineer manage and support MLOps platform components across AWS and EKS-based environments. Oversee deployment configuration and operation of infrastructure used for ML training batch inference and real-time model serving. Ensure platform availability resilience and performance across dev test and production environments. Implement role-based access controls (RBAC) network policies and scalable namespace designs within EKS.
Model Deployment & CI/CD Automation
- Build and support CI/CD pipelines (GitLab) for model packaging container image builds vulnerability scanning and automated deployment standardized model release processes including environment promotion versioning and rollback CI/CD with ML frameworks model repositories artifacts and runtime environments
Container & Kubernetes Workloads
- Design and manage EKS workloads supporting containerized ML jobs and microservices.
- Implement auto-scaling resource quotas cluster optimization and multi-tenant workload GPU and CPU-based training/inference workloads
- Monitoring Observability & Optimization
- Implement logging monitoring and alerting for ML pipelines model endpoints batch jobs and platform components. Analyze compute storage and data transfer usage to optimize cost efficiency across ML workloads Perform incident response root cause analysis and long-term remediation planning.
Collaboration & Enablemen
- Partner with Data Scientists ML Engineers and application teams to operationalize end-to-end machine learning technical guidance on best practices for ML model lifecycle management deployment patterns and scalable architectures. Contribute to documentation runbooks onboarding materials and internal knowledge bases.
Required Qualifications:
- 3 years of hands-on experience with AWS services including EKS EC2 S3 IAM CloudWatch.
- Strong experience operating and troubleshooting Kubernetes (preferably AWS EKS).
- Proficiency in containerization (Docker) and orchestration concepts.
- Strong programming/scripting experience in Python and Bash.
- Experience building and managing CI/CD pipelines (GitLab or equivalent).
- Familiarity with machine learning workflows including training inference and model monitoring
- Experience with infrastructure-as-code (Terraform or CloudFormation).
- Experience supporting production platforms including incident management and root cause analysis.
Preferred Qualifications
- Experience managing Data Analytics Platforms / Tools (e.g. Domino SageMaker)
- Experience with ML lifecycle tools such as MLflow or similar
- Experience supporting GPU-based workloads or distributed training environments.
- Familiarity with enterprise MLOps architectures and patterns (batch real-time microservices)
- Understanding of data processing frameworks and feature pipelines.
Other Competencies
- Strong analytical troubleshooting and problem-solving skills.
- Effective communication and documentation abilities.
- Ability to collaborate across engineering analytics and product teams.
- Self-motivated with the ability to drive initiatives independently.
- Ability to work in a complex regulated enterprise environment.
Job Title: MLOPS PLATFORM ENGINEER Location: RestonVA Duration: 12 Months Visa: USC GC And H1B Contract Type: W2 Description: The Data Modeling Analytics & AI Engineering team is seeking an experienced MLOps Platform Engineer to design build and support enterprise-grade machine learning operat...
Job Title: MLOPS PLATFORM ENGINEER
Location: RestonVA
Duration: 12 Months
Visa: USC GC And H1B
Contract Type: W2
Description:
- The Data Modeling Analytics & AI Engineering team is seeking an experienced MLOps Platform Engineer to design build and support enterprise-grade machine learning operations capabilities. This role will play a key part in enabling scalable reliable and secure ML model development and deployment across our cloud and container platforms.
- This is a hands-on engineering role requiring strong expertise in AWS Kubernetes (EKS) CI/CD automation containerization and ML platform operations. The ideal candidate will have solid engineering fundamentals combined with practical knowledge of ML workflows deployment patterns and platform reliability.
Key Responsibilities
Platform Engineering & Operations
- Engineer manage and support MLOps platform components across AWS and EKS-based environments. Oversee deployment configuration and operation of infrastructure used for ML training batch inference and real-time model serving. Ensure platform availability resilience and performance across dev test and production environments. Implement role-based access controls (RBAC) network policies and scalable namespace designs within EKS.
Model Deployment & CI/CD Automation
- Build and support CI/CD pipelines (GitLab) for model packaging container image builds vulnerability scanning and automated deployment standardized model release processes including environment promotion versioning and rollback CI/CD with ML frameworks model repositories artifacts and runtime environments
Container & Kubernetes Workloads
- Design and manage EKS workloads supporting containerized ML jobs and microservices.
- Implement auto-scaling resource quotas cluster optimization and multi-tenant workload GPU and CPU-based training/inference workloads
- Monitoring Observability & Optimization
- Implement logging monitoring and alerting for ML pipelines model endpoints batch jobs and platform components. Analyze compute storage and data transfer usage to optimize cost efficiency across ML workloads Perform incident response root cause analysis and long-term remediation planning.
Collaboration & Enablemen
- Partner with Data Scientists ML Engineers and application teams to operationalize end-to-end machine learning technical guidance on best practices for ML model lifecycle management deployment patterns and scalable architectures. Contribute to documentation runbooks onboarding materials and internal knowledge bases.
Required Qualifications:
- 3 years of hands-on experience with AWS services including EKS EC2 S3 IAM CloudWatch.
- Strong experience operating and troubleshooting Kubernetes (preferably AWS EKS).
- Proficiency in containerization (Docker) and orchestration concepts.
- Strong programming/scripting experience in Python and Bash.
- Experience building and managing CI/CD pipelines (GitLab or equivalent).
- Familiarity with machine learning workflows including training inference and model monitoring
- Experience with infrastructure-as-code (Terraform or CloudFormation).
- Experience supporting production platforms including incident management and root cause analysis.
Preferred Qualifications
- Experience managing Data Analytics Platforms / Tools (e.g. Domino SageMaker)
- Experience with ML lifecycle tools such as MLflow or similar
- Experience supporting GPU-based workloads or distributed training environments.
- Familiarity with enterprise MLOps architectures and patterns (batch real-time microservices)
- Understanding of data processing frameworks and feature pipelines.
Other Competencies
- Strong analytical troubleshooting and problem-solving skills.
- Effective communication and documentation abilities.
- Ability to collaborate across engineering analytics and product teams.
- Self-motivated with the ability to drive initiatives independently.
- Ability to work in a complex regulated enterprise environment.
View more
View less