Engineering Manager Platform Reliability
Job Summary
P-1535
At Databricks we are passionate about enabling data teams to solve the worlds toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the worlds best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers and customer obsessed we leap at every opportunity to solve technical challenges from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And were only getting started.
The Lakebase Platform Reliability teams footprint spans multiple stacks systems and stakeholders. They include AI-powered tooling and workflows for customer management real-time observability during incidents monitoring and auditing systems that underpin compliance requirements and customer-facing operational APIs and maintenance workflows. Youll contribute to the wider platform mission: building resource management infrastructure reliable distributed services and internal tools that help Databricks engineers operate confidently across clouds and environments.
The impact you will have:
- Hire great engineers to build an outstanding team.
- Support engineers in their career development by providing clear feedback and develop engineering leaders.
- Ensure high technical standards by instituting processes (architecture reviews testing) and culture (engineering excellence).
- Work with engineering and product leadership to build a long-term roadmap.
- Coordinate execution and collaborate across teams to unblock cross-cutting projects.
- Resource management infrastructure powering the big data and machine learning workloads on the Databricks platform in a scalable secure and cloud-agnostic way
- Lead development of reliable scalable services and client libraries that work with massive amounts of data on the cloud across geographic regions and Cloud providers
- Build tools to allow Databricks engineers to operate their services across different clouds and environments
- Build services products and infrastructure at the intersection of machine learning and distributed systems.
What we look for:
- 5 years of Engineering experience and 2 years of Engineering Management experience.
- Experience with large-scale distributed services and the processes around testing monitoring and SLAs.
- Ability to align multiple stakeholders on competing priorities.
- Able to balance short-term delivery against long-term stability.
- BS (or higher) in Computer Science or a related field.
Required Experience:
Manager
About Company
The Databricks Platform is the world’s first data intelligence platform powered by generative AI. Infuse AI into every facet of your business.