Sr. Manager – Data & AI Support Engineering
Job Summary
P-1388
As a Sr. Manager of the Data & AI Support Engineering team you will lead and manage a team of Technical Solutions Engineers responsible for driving deep technical resolutions for complex customer issues across Spark AI/ML Streaming and Lakehouse platforms. You will help customers realize business value from Databricks Ecosystem products through strong technical leadership AI-first operational innovation and customer-centric execution.
Mission
Lead and scale a world-class AI-first Data & AI Support Engineering organization that combines deep technical expertise operational excellence intelligent automation and customer-centric support to accelerate issue resolution improve platform reliability and drive exceptional customer outcomes across enterprise-scale Data and AI workloads.
- Build AI-enabled support workflows and reusable automations to improve resolution speed and support quality.
- Use Agentic AI systems logs telemetry observability platforms and internal systems to accelerate troubleshooting and root-cause analysis safely.
- Create reusable runbooks prompts and agentic workflows that scale operational efficiency across teams.
- Ensure strong AI governance customer data safety validation practices auditability and human-in-the-loop controls.
- Partner with Engineering and Product teams to drive AI-first support innovation and operational excellence.
Outcomes
- Drive AI-first support transformation initiatives that improve resolution speed case quality operational efficiency and customer experience.
- Partner with Engineering and Product teams to operationalize AI-assisted diagnostics observability insights and intelligent escalation management for enterprise customers.
- Build and scale reusable AI-enabled workflows automations runbooks and operational intelligence frameworks across the support organization.
- Lead and manage Technical Solutions Engineers Team Leads and support operations personnel across AMER support functions based out of the Dallas location.
- Own and improve operational KPIs including customer satisfaction escalation management backlog health resolution efficiency and support quality.
- Act as a senior escalation point for customers and internal teams while driving operational excellence and process optimization.
- Lead hiring onboarding mentoring technical assessments training and career development for support engineers and technical leads.
- Conduct regular one-on-ones annual review and career development discussions with direct reports.
- Be a hands-on technical leader supporting complex issues related to Spark Core Spark SQL Structured Streaming Delta Lake Lakehouse architecture and Databricks Runtime technologies.
- Guide customers on Spark runtime optimization distributed systems performance and best practices for scalable Data & AI workloads.
- Own Engineering JIRA escalations and proactively drive faster resolutions for customer-reported product issues.
- Maintain internal operational documentation runbooks and customer-facing knowledge base assets.
- Coordinate closely with Engineering and Backline Support engineering customer experience intelligence teams to identify reproduce and report product defects effectively.
- Act as a strong customer advocate and collaborate with cloud partners to support mutual customer success.
- Participate in major incident management escalation handling on-call rotations and critical production support activities.
What we are looking for:
- 10 years of experience designing building troubleshooting and supporting large-scale Data & AI applications using Python Java Scala Spark or related distributed technologies.
- Strong work experience of AI-enabled support workflows agentic AI systems Claude Skills workflows RAG architectures vector databases and any other operational automation frameworks.
- Proven development/delivery experience at a production scale in Databricks tech stacks like Model serving Lakehouse Delta DLT Lakeflow Lakebase platforms is a strong plus.
- Experience using AI tools for troubleshooting root-cause analysis observability analysis and support workflow acceleration.
- Strong hands-on expertise in Apache Spark Spark SQL Structured Streaming Delta Lake and distributed data processing systems.
- Experience leading production-scale workloads across Big Data Hadoop AI/ML Kafka Streaming Data Science or Analytics platforms.
- Strong troubleshooting and performance tuning experience for Spark and JVM-based distributed systems including memory management garbage collection heap analysis and thread dump analysis.
- Hands-on experience with AWS Azure or GCP cloud platforms.
- Proven experience managing globally distributed technical teams and handling high-severity customer escalations.
- Strong analytical debugging problem-solving and distributed systems troubleshooting skills.
- Excellent written and verbal communication skills with strong customer-facing leadership abilities.
- Strong organizational multitasking stakeholder management and operational leadership capabilities.
Required Experience:
Manager
About Company
The Databricks Platform is the world’s first data intelligence platform powered by generative AI. Infuse AI into every facet of your business.