Requirements
Join Our SRE Dream Team!
Are you a tech wizard who thrives on keeping systems humming and scaling new heights We re on the hunt for a Site Reliability Engineer (SRE) to join our dynamic fast-paced crew. If you re passionate about building bulletproof infrastructure and automating the world this is your chance to shine!
What s the Gig
You ll be the mastermind behind systems that power thousands of users ensuring they re reliable scalable and ready for action. From architecting monitoring solutions to leading incident response you ll shape the future of our tech stack while mentoring the team and driving innovation.
Your Superpowers (Essential Skills)
- Strong understanding of networking fundamentals
- Skilled with AWS
- Proficiency in at least one programming language: Python Go or JavaScript/TypeScript
- Understanding of containerization (Docker) and orchestration principles
- Experience with monitoring and alerting systems
- Understanding of CI/CD principles
- Version control with Git
- Any additional responsibilities assigned in the Agile Working Model (AWM) Charter.
Your Mission Should You Choose to Accept It
Core responsibilities:
- System Reliability: Design and implement robust scalable infrastructure solutions
- Observability: Architect and maintain comprehensive monitoring solutions
- Automation: Create automated workflows to reduce toil and human error
- Incident Management: Lead incident response and drive systematic improvements
- Technical Leadership: Mentor team members and influence technical decisions
- Tool Development: Build internal tools that enhance operational efficiency
- Cross-team Collaboration: Work closely with development teams to improve service reliability
- Best Practices: Establish and enforce SRE best practices across the organization
AWS, Python, Go, or JavaScript/TypeScript, CI/CD principles, Git