Sr Software Engineer, Infrastructure
San Francisco, CA - USA
Job Summary
GAQ127R41
Location: San Francisco CA (Hybrid)
Team: IT Infrastructure and Operations
About the Role
At Databricks Information Technology we are a product-led organization transforming the way we work from how easy it is to use our IT services to the applications we develop that help us scale seamlessly in the face of incredible growth.
As a Senior Software Engineer (Infrastructure) you will be a core technical contributor on the IT Infrastructure team owning and driving the evolution of our core infrastructure and observability platforms. This role requires a strong software engineering mindset deep technical breadth across SRE and infrastructure worlds and the ability to deliver high-quality scalable solutions for currently immature system problems. You will be responsible for building resilient scalable and automated infrastructure that empowers our development teams. As a senior member of the team you will bridge the gap between software engineering and systems architecture ensuring our AWS environment is cost-optimized secure and highly available.
The Impact You Will Have
- Architect and Automate: Design and deploy production-grade infrastructure on AWS using Terraform or Pulumi.
- Orchestration: Manage and scale containerized workloads using AKS (Azure Kubernetes Service) or EKS focusing on cluster security and resource efficiency.
- CI/CD Excellence: Architect robust deployment pipelines using GitHub Actions managing both GitHub-hosted and self-hosted runners for specialized build requirements.
- Drive Observable by Default Frameworks: Create underlying infrastructure to ensure new internal applications are secure and have logging and metrics enabled by default
- Tooling Scripting & AI : Build internal CLI toolsAI plugins and automation scripts to streamline developer workflows and enhance operational efficiency
- Partner Cross-Functionally: Collaborate with stakeholders across Security Engineering Infrastructure and Support to deliver impactful projects with real business outcomes.
- Mentor and Document: Participate in Code reviews Document solutions and failure triage playbooks and mentor junior engineers on the platforms you own.
What We Look For
- Software Engineering Expertise: 5 years of production-level experience with a strong proficiency in Python (non-negotiable).
- IaC: Expert-level proficiency in Terraform (modules state management) or Pulumi(Preferred).
- Cloud & Infrastructure Breadth: Hands-on experience with AWS (or Azure/GCP) Kubernetes Docker and containerization concepts.
- Automation & Integration Mindset: Experience building and troubleshooting integrations between infrastructure data pipelines and observability platforms.
- CI/CD: Advanced knowledge of Github Actions Github Runners.
- Strong Observability Mindset: Understanding of observability pillars (logging metrics tracing) and hands-on experience with tools like Datadog Prometheus or ELK.
- Distributed Systems: Proficiency in running systems through concepts like Kafka or messaging queues.
- Independent Execution: Ability to operate with minimal guidance take ownership of ambiguous projects and follow a vision set by tech leads to execute independently.
Required Experience:
Senior IC
About Company
The Databricks Platform is the world’s first data intelligence platform powered by generative AI. Infuse AI into every facet of your business.