Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailAs the Platform Observability Engineering Manager within Fords Data Platforms and Engineering (DP&E) organization you will lead a team responsible for building and maintaining a bestinclass platform for monitoring and observability. This platform will focus on the four golden signals (latency traffic errors and saturation) providing critical data to support operations root cause analysis continuous improvement and cost optimization initiatives. Working closely with platform architects you will design develop and maintain a scalable and reliable platform ensuring seamless integration with systems used by various teams across the organization. Your leadership will be key in mentoring your engineering team driving improvements in MTTR and MTTX through increased visibility into system performance collaborating with stakeholders to integrate observability data into their workflows developing insightful dashboards and reports continuously improving platform performance and reliability optimizing costs and staying current with industry best practices and technologies. The ideal candidate possesses extensive experience managing largescale highavailability systems a deep understanding of the four golden signals experience with monitoring tools (Prometheus Grafana Jaeger etc. strong leadership and communication skills and ideally experience with cloud platforms (AWS Azure GCP). This role prioritizes building and maintaining a robust platform; it is not focused on developing individual monitoring tools. Instead it focuses on creating a centralized reliable source of observability data that empowers datadriven decisions and accelerates incident response across the organization.
I. Engineering Leadership & Management:
Proven experience 7 years) in a leadership role managing engineering teams ideally with a focus on platform engineering Platforms Observability or similar areas. Experience managing remote teams is a plus.
Experience leading and mentoring engineering teams fostering a culture of innovation continuous learning and technical excellence. Demonstrated ability to drive strategic technical decisions and ensure alignment with broader organizational goals.
Proven ability to build and maintain highperforming teams promoting accountability ownership and collaboration. Experience with performance management including conducting performance reviews and providing constructive feedback.
Excellent communication and interpersonal skills with a proven ability to cultivate crossfunctional collaboration and build strong relationships with stakeholders at all levels.
II. Agile & Scrum Practices:
Deep understanding and practical experience with Agile methodologies (Scrum Kanban) including facilitating daily standups sprint planning backlog grooming and sprint retrospectives.
Experience working closely with Product Managers to align engineering efforts with product goals ensure welldefined user stories and manage priorities effectively.
Proven ability to ensure engineering rigor in story hygiene including clear acceptance criteria welldefined dependencies and a focus on deliverability within the sprint.
III. Technical Expertise & Accountability:
Deep understanding of platform engineering principles and experience designing building and maintaining scalable and reliable infrastructure for ML workloads.
Expertise in DevOps practices including CI/CD pipelines (Jenkins GitLab CI GitHub Actions) infrastructureascode (Terraform Ansible CloudFormation) and automation.
Proficiency in at least one programming language (e.g. Python Java) sufficient to effectively communicate with and guide your engineering team. You wont be expected to contribute to team capacity by coding but you need to be able to speak the language of your engineers.
Strong understanding of cloud solutions and offerings (preferably GCP services Compute Engine Kubernetes Engine Cloud Functions BigQuery Pub/Sub Cloud Storage Vertex AI). Experience with other major cloud providers (AWS Azure) is also valuable.
Experience with designing and implementing microservices and serverless architectures. Experience with containerization (Docker Kubernetes) is highly beneficial.
Experience with monitoring and optimizing platform performance ensuring systems are running efficiently and meeting SLAs. Proven ability to lead incident management efforts and implement continuous improvements to enhance reliability.
Commitment to best engineering practices including code reviews testing and documentation. A focus on building maintainable and scalable systems is essential.
IV. Operational Excellence & Cost Optimization:
Proven ability to drive cost optimization initiatives particularly in cloud infrastructure and resource usage aligning with Fords broader costreduction goals.
Experience tracking and reporting key metrics for your domain/platform related to team performance including quality and operational efficiency.
Required Experience:
Manager
Full-Time