Systems PhD Software Engineer

Databricks

Job Location:

Seattle, OR - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

P-1299

Databricks is radically simplifying the entire data lifecycle from ingestion to generative AI and everything in-between. Were doing it cross-cloud with a unified platform currently serving over 10k customers processing exabytes of data/day on 15 million VMs and growingexponentially.

To make it happen were building multi-cloud systems at every corner of the data ecosystem from query engines vector databases training pipelines and storage systems down to the infrastructure that allows them to scale like auto-sharders caches and load balancers just to name a few. We also build and support the tooling languages and stacks that bring it together. Basically we do it all.

The space we work in and the problems we solve are massive complex and very deep (our published work on Lakehouse Delta lake and Photon are a testament to that). Were looking for practitioners who are eager to work with the best in industry to push the boundaries of whats possible for our customers. If youre truth seeking data driven and love to operate from first principles (head fake: our core values) then Databricks is the place for you.

As a part of the Database Engine team there are opportunities to design and implement in many areas that leapfrog existing state-of-the-art systems:

Query compilation & optimization
Distributed query execution and scheduling
Vectorized engine execution
Data security
Resource Management
Transaction coordination
Efficient storage structures (encoding indexes)
Automatic physical data optimization

What we look for:

PhD in databases or systems
A passion for database systems storage systems distributed systems language design and/or performance optimization
Motivated by delivering customer value and impact

Required Experience:

P-1299Databricks is radically simplifying the entire data lifecycle from ingestion to generative AI and everything in-between. Were doing it cross-cloud with a unified platform currently serving over 10k customers processing exabytes of data/day on 15 million VMs and growingexponentially.To make it ...