Hi Team we are looking out for a Fulltime Chaos Engineer
Location Arlington Texas USA (Day 1 Onsite)
Type FTE for USA Residents only
Key Responsibilities:
- Chaos Testing and Experimentation: Design and execute chaos engineering experiments to identify weaknesses in systems and improve resilience.
- System Analysis: Analyze system behavior under stress conditions and develop strategies to mitigate potential failures.
- Performance Monitoring: Continuously monitor system performance identify vulnerabilities and implement improvements to enhance resilience.
- Collaboration and Training: Work with crossfunctional teams to understand system requirements and provide training on chaos engineering principles and best practices.
- Documentation: Develop and maintain comprehensive documentation on chaos experiments findings and mitigation strategies.
Experience
- 5 years of experience in software engineering or system reliability engineering with a focus on chaos engineering and resilience testing.
Technical Skills
- Proficiency in chaos engineering tools (e.g. Chaos Monkey Gremlin) and scripting languages (e.g. Python Bash).
- Experience with cloud platforms (e.g. AWS Azure) and container orchestration (e.g. Kubernetes).
- Knowledge of distributed systems and microservices architecture.
- Familiarity with monitoring and observability tools (e.g. Prometheus Grafana).
- Experience with incident response and root cause analysis.
Desired Skills
- Strong analytical and problemsolving abilities.
- Excellent communication and collaboration skills to interact with technical and nontechnical stakeholders.
- Ability to mentor and train junior engineers on chaos engineering practices.
- Commitment to continuous learning to stay updated with the latest chaos engineering techniques and tools.
- Familiarity with tools such as Jenkins Ansible Terraform.