Role summary
Were hiring a hands-on technologist to lead the design delivery and operational scaling of mission-critical AI/ML systems. This is a senior technical leadership role combining end-to-end system design deep ML engineering research translation and team enablement. You will set technical direction unblock engineering teams and take direct ownership for measurable business outcomes.
Core responsibilities
- Own end-to-end system design and architecture for production ML/AI solutions from problem framing data design model selection and infra to monitoring runbooks and cost control.
- Lead hands-on technical delivery: prototype validate harden and ship models and agentic components into live systems; ensure reliability observability and automated CI/CD for models and data pipelines.
- Act as the single technical escalation point remove blockers resolve cross-team technical tradeoffs and make final architecture decisions that balance performance scalability cost and vendor dependencies.
- Mentor and grow engineering teams (ML engineers data engineers OR engineers) set engineering standards code/architecture reviews and champion best practices (MLOps testing data contracts).
- Translate research into product: evaluate papers run experiments lead IP efforts (patents trade secrets) and supervise research to production pipelines.
- Drive multiple projects in parallel with clear prioritization milestones and delivery SLAs; align technical plans to business KPIs.
- Define track and report success metrics and ROI for all ML initiatives; continuously tune models and design experiments for measurable impact.
- Collaborate with product platform security legal and operations teams to ensure compliance data privacy and safe explainable model behavior.
- Works closely with Product Platform Security Legal and Business stakeholders.
- Become a visible technical leader within the company represent technical strategy externally when needed.
Qualifications :
Must-have qualifications & experience
- 12 years of hands-on experience in Machine Learning/AI engineering and solution delivery across classical ML generative models and agentic/LLM-based systems.
- Proven ability to design production ML platforms (data ingestion training serving monitoring retraining) with scalability reliability and cost awareness.
- Deep expertise in system & distributed design: data architectures feature stores model serving streaming/batch pipelines autoscaling retries/poison-pill handling and disaster recovery.
- Strong MLOps and DevOps experience: CI/CD for models monitoring (data model drift) A/B testing canary deployment and rollback strategies.
- Strong practical experience in classical ML deep learning and modern LLM/agentic systems (RAG fine-tuning evaluation guardrails) using Python and modern ML frameworks (PyTorch/TensorFlow).
- Experience with CI/CD for models containerization (Docker/Kubernetes) model serving monitoring drift detection and automated retraining pipelines.
- Strong coding and service design (APIs/microservices) testing practices high-availability design observability and incident handling for live AI systems.
- Experience mentoring/leading senior engineers and small cross-functional teams; comfortable as the technical owner across several concurrent initiatives.
- Prior experience publishing research and participating in IP creation (patent filings trade secrets) is required.
- Excellent communication skills able to present technical tradeoffs to both engineering and executive stakeholders.
Preferred skills
- Background with reinforcement learning foundation models LLMs and agent orchestration frameworks.
- Hands-on with cloud platforms (AWS/GCP) and on-prem hybrid deployments.
- Strong software engineering fundamentals: scalable microservices API design security best practices and cost optimization.
- Familiarity with optimization/OR techniques and integrating them with ML pipelines.
System-design expectations
- Lead architecture reviews and design sessions; produce clear system diagrams component ownership latency/capacity budgets cost estimations and failure-mode analyses.
- Define data contracts SLAs service-level objectives and monitoring thresholds for every deliverable.
- Ensure designs are modular testable and observable with clear automation for deployment rollback and incident response.
- Make pragmatic architecture choices: prefer simpler solutions that meet business needs and constrain cost/dependencies; justify when heavy engineering is necessary.
Additional Information :
Because Western Digital thrives on the power of diversity and is committed to an inclusive environment where every individual can thrive through a sense of belonging respect and contribution we are committed to giving every qualified applicant and employee an equal opportunity. Western Digital does not discriminate against any applicant or employee based on their protected class status and complies with all federal and state laws against discrimination harassment and retaliation as well as the laws and regulations set forth in the Equal Employment Opportunity is the Law poster.
Western Digital thrives on the power and potential of diversity. As a global company we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees our company our customers and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging respect and contribution.
Western Digital is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at to advise us of your accommodation your email please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.
Notice To Candidates: Please be aware that Western Digital and its subsidiaries will never request payment as a condition for applying for a position or receiving an offer of employment. Should you encounter any such requests please report it immediately to Western Digital Ethics Helpline or email .
Remote Work :
No
Employment Type :
Full-time
Role summaryWere hiring a hands-on technologist to lead the design delivery and operational scaling of mission-critical AI/ML systems. This is a senior technical leadership role combining end-to-end system design deep ML engineering research translation and team enablement. You will set technical di...
Role summary
Were hiring a hands-on technologist to lead the design delivery and operational scaling of mission-critical AI/ML systems. This is a senior technical leadership role combining end-to-end system design deep ML engineering research translation and team enablement. You will set technical direction unblock engineering teams and take direct ownership for measurable business outcomes.
Core responsibilities
- Own end-to-end system design and architecture for production ML/AI solutions from problem framing data design model selection and infra to monitoring runbooks and cost control.
- Lead hands-on technical delivery: prototype validate harden and ship models and agentic components into live systems; ensure reliability observability and automated CI/CD for models and data pipelines.
- Act as the single technical escalation point remove blockers resolve cross-team technical tradeoffs and make final architecture decisions that balance performance scalability cost and vendor dependencies.
- Mentor and grow engineering teams (ML engineers data engineers OR engineers) set engineering standards code/architecture reviews and champion best practices (MLOps testing data contracts).
- Translate research into product: evaluate papers run experiments lead IP efforts (patents trade secrets) and supervise research to production pipelines.
- Drive multiple projects in parallel with clear prioritization milestones and delivery SLAs; align technical plans to business KPIs.
- Define track and report success metrics and ROI for all ML initiatives; continuously tune models and design experiments for measurable impact.
- Collaborate with product platform security legal and operations teams to ensure compliance data privacy and safe explainable model behavior.
- Works closely with Product Platform Security Legal and Business stakeholders.
- Become a visible technical leader within the company represent technical strategy externally when needed.
Qualifications :
Must-have qualifications & experience
- 12 years of hands-on experience in Machine Learning/AI engineering and solution delivery across classical ML generative models and agentic/LLM-based systems.
- Proven ability to design production ML platforms (data ingestion training serving monitoring retraining) with scalability reliability and cost awareness.
- Deep expertise in system & distributed design: data architectures feature stores model serving streaming/batch pipelines autoscaling retries/poison-pill handling and disaster recovery.
- Strong MLOps and DevOps experience: CI/CD for models monitoring (data model drift) A/B testing canary deployment and rollback strategies.
- Strong practical experience in classical ML deep learning and modern LLM/agentic systems (RAG fine-tuning evaluation guardrails) using Python and modern ML frameworks (PyTorch/TensorFlow).
- Experience with CI/CD for models containerization (Docker/Kubernetes) model serving monitoring drift detection and automated retraining pipelines.
- Strong coding and service design (APIs/microservices) testing practices high-availability design observability and incident handling for live AI systems.
- Experience mentoring/leading senior engineers and small cross-functional teams; comfortable as the technical owner across several concurrent initiatives.
- Prior experience publishing research and participating in IP creation (patent filings trade secrets) is required.
- Excellent communication skills able to present technical tradeoffs to both engineering and executive stakeholders.
Preferred skills
- Background with reinforcement learning foundation models LLMs and agent orchestration frameworks.
- Hands-on with cloud platforms (AWS/GCP) and on-prem hybrid deployments.
- Strong software engineering fundamentals: scalable microservices API design security best practices and cost optimization.
- Familiarity with optimization/OR techniques and integrating them with ML pipelines.
System-design expectations
- Lead architecture reviews and design sessions; produce clear system diagrams component ownership latency/capacity budgets cost estimations and failure-mode analyses.
- Define data contracts SLAs service-level objectives and monitoring thresholds for every deliverable.
- Ensure designs are modular testable and observable with clear automation for deployment rollback and incident response.
- Make pragmatic architecture choices: prefer simpler solutions that meet business needs and constrain cost/dependencies; justify when heavy engineering is necessary.
Additional Information :
Because Western Digital thrives on the power of diversity and is committed to an inclusive environment where every individual can thrive through a sense of belonging respect and contribution we are committed to giving every qualified applicant and employee an equal opportunity. Western Digital does not discriminate against any applicant or employee based on their protected class status and complies with all federal and state laws against discrimination harassment and retaliation as well as the laws and regulations set forth in the Equal Employment Opportunity is the Law poster.
Western Digital thrives on the power and potential of diversity. As a global company we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees our company our customers and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging respect and contribution.
Western Digital is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at to advise us of your accommodation your email please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.
Notice To Candidates: Please be aware that Western Digital and its subsidiaries will never request payment as a condition for applying for a position or receiving an offer of employment. Should you encounter any such requests please report it immediately to Western Digital Ethics Helpline or email .
Remote Work :
No
Employment Type :
Full-time
View more
View less