Join the Chaos Engineering team in Amazon Search. We perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products they find what they are looking for.
In this role you will:
- Design implement execute and automate chaos experiments to continuously test Amazon Search resilience against hardware failures dependency outages traffic spikes and more.
- Collaborate with service owners to remedy vulnerabilities minimize blast radius and harden Amazon Search.
- Research tools and practices in resilience engineering and adopt them as appropriate.
Joining this team youll experience the benefits of working in an entrepreneurial environment while leveraging the resources of (AMZN) one of the worlds leading internet companies. We are a diverse customer-obsessed and passionate team located in Meguro Tokyo.
Key job responsibilities
- Develop and maintain our chaos experiment orchestrator
- Design execute automate and maintain chaos experiments
- Develop and maintain our distributed load generator
- Develop and maintain our petabyte-scale log archival and query service
- Join a 12/12 on-call rotation for incident response and mitigation
- Experience programming with at least one modern language such as Python Ruby Golang Java C C# Rust
- Experience with Linux/Unix
- Experience in networking storage systems operating systems and hands-on systems engineering
- Experience with distributed operational health and performance monitoring systems
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit
for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.