Zoox is seeking a Site Reliability Engineer to help ensure the availability performance and resilience of the services that power the development and operation of our autonomous this role you will own the full lifecycle of our servicesfrom designing fault-tolerant maintainable systems to deploying operating and continuously improving them in production. As a robotics company Zoox embraces automation at every layer of our infrastructure and youll help drive that ethos forward. Youll work hands-on with systems that process massive volumes of data and support compute-intensive pipelines running on both CPUs and GPUs.
In this role you will:
Architect and optimize scalable systems: You will design implement and continuously improve highly reliable infrastructure directly impacting the success and safety of Zooxs autonomous vehicle platform.
Build proactive monitoring solutions: You will develop advanced monitoring alerting and reporting tools to ensure potential issues are identified and resolved before they affect production.
Collaborate across engineering: You will partner closely with software engineering teams to elevate our system architecture streamline deployment processes and drive automation initiatives.
Lead incident resolution: You will conduct thorough root cause analyses on production issues and rapidly deploy corrective actions to maintain a resilient and stable environment.
Ensure business continuity: You will safeguard the companys operations by designing and implementing robust disaster recovery plans to keep the Zoox fleet running smoothly under any circumstances.
Qualifications
SRE & Distributed Systems Experience: 5 years of experience in site reliability engineering or a similar role with a strong objective background in managing large-scale distributed systems.
Cloud & Infrastructure as Code (IaC): Proven experience operating within major cloud platforms (AWS GCP or Azure) and utilizing IaC tools like Terraform Ansible Salt or CloudFormation.
Container Orchestration: Technical expertise in deploying managing and scaling systems using container orchestration technologies such as Kubernetes.
Core Infrastructure Knowledge: Deep foundational understanding of networking protocols storage solutions and database technologies.
Programming Proficiency: Strong demonstrable programming and scripting skills in languages such as Python Go C/C or Java.
About Zoox
Zoox is developing the first ground-up fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics machine learning and design Zoox aims to provide the next generation of mobility-as-a-service in urban environments. Were looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.
Accommodations
A Final Note:
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
Required Experience:
IC
Zoox is seeking a Site Reliability Engineer to help ensure the availability performance and resilience of the services that power the development and operation of our autonomous this role you will own the full lifecycle of our servicesfrom designing fault-tolerant maintainable systems to deploying ...
Zoox is seeking a Site Reliability Engineer to help ensure the availability performance and resilience of the services that power the development and operation of our autonomous this role you will own the full lifecycle of our servicesfrom designing fault-tolerant maintainable systems to deploying operating and continuously improving them in production. As a robotics company Zoox embraces automation at every layer of our infrastructure and youll help drive that ethos forward. Youll work hands-on with systems that process massive volumes of data and support compute-intensive pipelines running on both CPUs and GPUs.
In this role you will:
Architect and optimize scalable systems: You will design implement and continuously improve highly reliable infrastructure directly impacting the success and safety of Zooxs autonomous vehicle platform.
Build proactive monitoring solutions: You will develop advanced monitoring alerting and reporting tools to ensure potential issues are identified and resolved before they affect production.
Collaborate across engineering: You will partner closely with software engineering teams to elevate our system architecture streamline deployment processes and drive automation initiatives.
Lead incident resolution: You will conduct thorough root cause analyses on production issues and rapidly deploy corrective actions to maintain a resilient and stable environment.
Ensure business continuity: You will safeguard the companys operations by designing and implementing robust disaster recovery plans to keep the Zoox fleet running smoothly under any circumstances.
Qualifications
SRE & Distributed Systems Experience: 5 years of experience in site reliability engineering or a similar role with a strong objective background in managing large-scale distributed systems.
Cloud & Infrastructure as Code (IaC): Proven experience operating within major cloud platforms (AWS GCP or Azure) and utilizing IaC tools like Terraform Ansible Salt or CloudFormation.
Container Orchestration: Technical expertise in deploying managing and scaling systems using container orchestration technologies such as Kubernetes.
Core Infrastructure Knowledge: Deep foundational understanding of networking protocols storage solutions and database technologies.
Programming Proficiency: Strong demonstrable programming and scripting skills in languages such as Python Go C/C or Java.
About Zoox
Zoox is developing the first ground-up fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics machine learning and design Zoox aims to provide the next generation of mobility-as-a-service in urban environments. Were looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.
Accommodations
A Final Note:
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.
Required Experience:
IC
View more
View less