P-186
At Databricks we are inspired by allowing data teams to solve the worlds toughest problems from security threat detection to cancer drug development. We do this by building and running the worlds best data and AI infrastructure platform so our customers can focus on the high value challenges that are central to their own missions.
Ourengineering teams build technical products that fulfill real important needs in the world. We always push the boundaries of data and AI technology while simultaneously operating with the security and scale that is important to making customers successful on our platform.
We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines generating terabytes of logs and processing exabytes of data per day. At our scale we observe cloud hardware network and operating system faults and our software must gracefully shield our customers from any of the above.
As a software engineer in the Runtime Observability team you will develop observability solutions that provide insights into the health and performance of our products and infrastructure.
You will report directly to the Director of Engineering.
The Impact You Will Have:
- You will collaborate with different teams to identify metrics that allow engineers to observe how well the system and different subcomponents are performing.
- You will build tooling and infrastructure to allow components to emit log and aggregate metrics that can be displayed on dashboards and used for alerting.
- You will scale the observability solutions to support millions of instances and billions of queries per day.
- You will develop processes and training for developers and field engineers to debug performance and reliability issues affecting customers.
What We Look For
- BS (or higher degree) in Computer Science or a related field
- 4 years of production level experience in one of: Java Scala C or similar language.
- Experience in software development in large-scale distributed systems
- Familiarity with metrics collection health monitoring and observability tools
- Experience building relationships with developers and field engineers to facilitate assessment and mitigation of performance and reliability problems.
- 6 years of production level experience in one of: Java Scala C or similar language.
- Experience driving large projects involving multiple teams. Provide appropriate guidance on developing large-scale systems that can handle billions of queries per day.
Required Experience:
Staff IC
P-186At Databricks we are inspired by allowing data teams to solve the worlds toughest problems from security threat detection to cancer drug development. We do this by building and running the worlds best data and AI infrastructure platform so our customers can focus on the high value challenges th...
P-186
At Databricks we are inspired by allowing data teams to solve the worlds toughest problems from security threat detection to cancer drug development. We do this by building and running the worlds best data and AI infrastructure platform so our customers can focus on the high value challenges that are central to their own missions.
Ourengineering teams build technical products that fulfill real important needs in the world. We always push the boundaries of data and AI technology while simultaneously operating with the security and scale that is important to making customers successful on our platform.
We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines generating terabytes of logs and processing exabytes of data per day. At our scale we observe cloud hardware network and operating system faults and our software must gracefully shield our customers from any of the above.
As a software engineer in the Runtime Observability team you will develop observability solutions that provide insights into the health and performance of our products and infrastructure.
You will report directly to the Director of Engineering.
The Impact You Will Have:
- You will collaborate with different teams to identify metrics that allow engineers to observe how well the system and different subcomponents are performing.
- You will build tooling and infrastructure to allow components to emit log and aggregate metrics that can be displayed on dashboards and used for alerting.
- You will scale the observability solutions to support millions of instances and billions of queries per day.
- You will develop processes and training for developers and field engineers to debug performance and reliability issues affecting customers.
What We Look For
- BS (or higher degree) in Computer Science or a related field
- 4 years of production level experience in one of: Java Scala C or similar language.
- Experience in software development in large-scale distributed systems
- Familiarity with metrics collection health monitoring and observability tools
- Experience building relationships with developers and field engineers to facilitate assessment and mitigation of performance and reliability problems.
- 6 years of production level experience in one of: Java Scala C or similar language.
- Experience driving large projects involving multiple teams. Provide appropriate guidance on developing large-scale systems that can handle billions of queries per day.
Required Experience:
Staff IC
View more
View less