Staff Backline Engineer (Apache Spark)

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

At Databricks we are passionate about enabling data teams to solve the worlds toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the worlds best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers we leap at every opportunity to tackle technical challenges from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And were only getting started.

About the Team

The Backline Engineering Team serves as the critical bridge between Engineering and Frontline Support. We handle complex technical issues and escalations across the Apache Spark ecosystem. With a strong focus on customer success we are committed to delivering exceptional customer satisfaction by providing deep technical expertise proactive issue resolution and continuous improvements to the platform. We emphasise automation and tooling to enhance troubleshooting efficiency reduce manual efforts and improve the overall supportability of the platform. By developing smart solutions and streamlining workflows we drive operational excellence and ensure a seamless experience for both customers and internal teams.

The impact you will have

Troubleshoot and resolve complex customer issues related to Apache Spark core internals Spark SQL Structured Streaming Databricks Delta and the Databricks AI product stack.
Perform in-depth code-level analysis of customer applications to identify root causes and deliver actionable solutions.
Conduct a comprehensive performance analysis of Spark workloads identifying opportunities to improve latency throughput and resource utilisation.
Optimise Spark applications by tuning configurations refining caching strategies and adjusting execution parameters.
Review the Spark job code for adherence to best practices in performance scalability and maintainability and provide recommendations.
Resolve performance issues in Spark applications including slow queries failed jobs memory leaks and other anomalies.
Build troubleshooting guides and runbooks to support the team.
Develop notebook-based repro scenarios and Data Engineering pipelines to demonstrate feature capabilities.
Collaborate with the Spark Engineering team to raise awareness of upcoming features and releases and identify bugs with potential workarounds.
Coordinate with engineering and escalation teams to ensure the timely resolution of customer issues.
Participate in both weekend and weekday on-call rotations.

Competencies

10 years of industry experiencedeveloping testing and sustaining Python or Java or Scala-based applications.
Comfortable with compiling building and navigating the Apache Spark source code.
Comfortable with identifying and applying patches/bug fixes to the Apache Spark source code.
Experience in Big Data/Hadoop/Spark/Kafka/Elasticsearch data pipelines.
Hands-on experience with SQL-based database systems.
Experience with performing complex troubleshooting steps like profiling services analysing various metrics and processing dumps to identify and resolve issues.
Ability to review the source code and identify the root cause of the issues in one or more distributed environments.
Strong understanding of distributed systems and their internal implementation.
Experience in JVM GC and Thread dump-based troubleshooting is required.
Experience with AWS or Azure-related services.
A bachelors degree in Computer Science or a related field is required.

Required Experience:

Staff IC

About the Team

The impact you will have

Troubleshoot and resolve complex customer issues related to Apache Spark core internals Spark SQL Structured Streaming Databricks Delta and the Databricks AI product stack.
Perform in-depth code-level analysis of customer applications to identify root causes and deliver actionable solutions.
Conduct a comprehensive performance analysis of Spark workloads identifying opportunities to improve latency throughput and resource utilisation.
Optimise Spark applications by tuning configurations refining caching strategies and adjusting execution parameters.
Review the Spark job code for adherence to best practices in performance scalability and maintainability and provide recommendations.
Resolve performance issues in Spark applications including slow queries failed jobs memory leaks and other anomalies.
Build troubleshooting guides and runbooks to support the team.
Develop notebook-based repro scenarios and Data Engineering pipelines to demonstrate feature capabilities.
Collaborate with the Spark Engineering team to raise awareness of upcoming features and releases and identify bugs with potential workarounds.
Coordinate with engineering and escalation teams to ensure the timely resolution of customer issues.
Participate in both weekend and weekday on-call rotations.

Competencies

10 years of industry experiencedeveloping testing and sustaining Python or Java or Scala-based applications.
Comfortable with compiling building and navigating the Apache Spark source code.
Comfortable with identifying and applying patches/bug fixes to the Apache Spark source code.
Experience in Big Data/Hadoop/Spark/Kafka/Elasticsearch data pipelines.
Hands-on experience with SQL-based database systems.
Experience with performing complex troubleshooting steps like profiling services analysing various metrics and processing dumps to identify and resolve issues.
Ability to review the source code and identify the root cause of the issues in one or more distributed environments.
Strong understanding of distributed systems and their internal implementation.
Experience in JVM GC and Thread dump-based troubleshooting is required.
Experience with AWS or Azure-related services.
A bachelors degree in Computer Science or a related field is required.

Required Experience:

Staff IC