AIML Data Engineer, Machine Learning Platform Technologies

Apple

Not Interested
Bookmark
Report This Job

profile Job Location:

Seattle, OR - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

In this role youll be architecting and building Apples next-generation ML dataset management platform. This platform enables ML teams across the company to efficiently manage the full lifecycle of datasets from initial curation and annotation through versioning model training and evaluation sharing and design scalable infrastructure that supports dataset operations at massive scale while maintaining strong governance guarantees. Your work will include building data lineage tracking systems implementing automated compliance workflows creating intuitive APIs and SDKs for dataset access and ensuring seamless integration with ML training and evaluation pipelinesYoull collaborate with teams building customer-facing ML features across iOS macOS and other Apple platforms as well as compute infrastructure teams and ML framework owners. Your platform work directly enables the ML innovations that millions of customers experience daily. This role offers the opportunity to have broad impact across Apples ML initiatives and to shape how thousands of ML practitioners build the intelligent experiences our customers love.


  • Bachelors degree in Computer Science related field or equivalent practical experience.
  • 10 years building and scaling data infrastructure for petabyte-scale ML workloads with high reliability
  • Deep expertise in modern data technologies (Apache Iceberg Spark S3 distributed systems) data modeling schema evolution and efficient storage formats (Parquet Arrow ORC)
  • Experience building data pipelines that handle diverse ML data types: structured/tabular data unstructured media (images video audio) embeddings and multimodal datasets
  • Proven track record building dataset management systems including versioning metadata management discovery and integration with production ML training pipelines
  • Experience designing data governance frameworks including lineage tracking access control retention policies and compliance workflows
  • Experience with cloud platforms (AWS GCP Azure) and container orchestration (Kubernetes)
  • Strong cross-functional collaboration skills to understand diverse stakeholder needs and articulate technical decisions across ML engineering data science legal and product teams


  • Hands-on experience curating or managing datasets for production ML models
  • Experience with data cataloging systems metadata platforms MLOps tools or ML training frameworks
  • Knowledge of privacy-preserving technologies and data quality/validation frameworks
In this role youll be architecting and building Apples next-generation ML dataset management platform. This platform enables ML teams across the company to efficiently manage the full lifecycle of datasets from initial curation and annotation through versioning model training and evaluation sharing ...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile