SageMaker MLOps Platform Engineer

Rishabh RPO


Job Location:

Plano, TX - USA

Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Title: MLOps Platform Engineer (SageMaker)
Location: Plano TX (Onsite)
Duration: 12 Months

Description:

RM Notes:

  • Export Control form will be required during onboarding only and is not required at the time of submission.
  • This position is with the Enterprise Analytical Data & Integration Team.
  • The hiring manager is looking to onboard an experienced MLOps Platform Engineer with strong expertise in AWS and Amazon SageMaker.
  • Local candidates are preferred.
  • 12-month contract with possible extension.
  • Onsite role.

Must-Have Skills

  • 10 15 years of software engineering experience focused on cloud infrastructure or ML platform operations.
  • 5 years of hands-on AWS experience including deep expertise in Amazon SageMaker (Studio Classic/Studio Pipelines Model Registry Endpoints Feature Store).
  • 3 years of experience building and operating production MLOps pipelines including training versioning deployment monitoring and rollback.
  • Experience with SageMaker Unified Studio or Studio Classic including domain/project setup blueprints and multi-tenant configuration.
  • MLflow or equivalent experiment tracking tools.
  • SageMaker Pipelines or similar workflow orchestration tools (Airflow Step Functions).
  • SageMaker Unified Studio experience is preferred; Studio Classic experience is mandatory.

What Were Looking For

Toyota Financial Services Enterprise Platforms team is seeking a Senior ML Platform Engineer to design build and operationalize an enterprise ML platform on AWS SageMaker Unified Studio.

The selected candidate will help migrate the organization from a fragmented ML toolchain to a unified governed platform on AWS Landing Zone 2 supporting the complete machine learning lifecycle-from data discovery through model deployment and monitoring.

Key Responsibilities

  • Set up SageMaker Unified Studio platform including domain configuration project provisioning persona-based roles and multi-environment (Dev Prod-UAT Prod) promotion workflows.
  • Build MLOps pipelines using SageMaker Pipelines for data extraction from Snowflake preprocessing training evaluation and model registration.
  • Manage SageMaker Model Registry including cross-account model promotion versioning immutability and lineage tracking.
  • Configure MLflow experiment tracking with auto-logging of parameters metrics and artifacts.
  • Set up identity and access management including Okta SSO SailPoint entitlements persona-based execution roles and service roles for pipelines.
  • Build model serving solutions using real-time SageMaker endpoints and batch prediction workflows.
  • Implement model monitoring for data drift model drift and performance degradation detection.
  • Configure data catalog capabilities including searchable datasets access-level visibility access-request workflows and lineage tracking.
  • Own platform operations including observability (CloudWatch Datadog) logging custom images and instance availability management.

Required Qualifications

Qualifications / What You Bring (Must-Haves)

  • 10 15 years of software engineering experience focused on cloud infrastructure or ML platform operations.
  • 5 years of hands-on AWS experience with strong expertise in:
    • Amazon SageMaker Studio
    • SageMaker Pipelines
    • Model Registry
    • Endpoints
    • Feature Store
  • 3 years of experience building and operating production MLOps pipelines including training versioning deployment monitoring and rollback.
  • Experience with SageMaker Unified Studio or Studio Classic including domain/project setup blueprints and multi-tenant configurations.
  • Infrastructure-as-Code experience using Terraform CDK or CloudFormation.
  • IAM design for ML platforms including execution roles service roles cross-account access Lake Formation and SSO/SAML.
  • MLflow or equivalent experiment tracking platform experience.
  • SageMaker Pipelines or similar orchestration frameworks (Airflow Step Functions).
  • Experience with model serving including real-time endpoints batch transform auto-scaling and endpoint monitoring.
  • Experience using Snowflake as a data source for ML pipelines.
  • Kubernetes (EKS) and container orchestration experience.
  • Strong understanding of networking and security concepts including VPCs security groups private endpoints and cross-account connectivity.

Preferred Qualifications

  • SageMaker Unified Studio domain provisioning custom blueprints and project standardization.
  • SageMaker Feature Store for online/offline feature management.
  • SageMaker Model Monitor including data quality checks bias detection and drift detection.
  • AWS Machine Learning Specialty Certification.
Job Title: MLOps Platform Engineer (SageMaker) Location: Plano TX (Onsite) Duration: 12 Months Description: RM Notes: Export Control form will be required during onboarding only and is not required at the time of submission. This position is with the Enterprise Analytical Data & Integration Team. ...