DescriptionAs part of the Site Reliability Engineering (SRE) team youll contribute to designing automating and evolving mission-critical systems. Youll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient self-healing services.
This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.
ResponsibilitiesWhat Youll Do:
- Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
- Design write and deploy software and automation tools that enhance availability observability and scalability.
- Own and evolve metrics SLOs SLAs KPIs and dashboards that track system health and customer experience.
- Build tooling to reduce manual operations and eliminate sources of toil.
- Improve CI/CD pipelines deployment processes and validation frameworks for reliability and efficiency.
- Review and influence architectural designs for distributed systems with a focus on resilience performance and fault tolerance.
- Lead and participate in post-incident reviews capacity planning and production-readiness assessments.
- Provide on-call support on a rotational basis (12-hour shifts 7-day coverage).
What Were Looking For:
- Advanced Linux systems administration
- Strong coding skills in Python (automation-focused)
- Intermediate experience with Bash/Shell scripting
- Familiarity with networking principles and distributed systems behavior
- Basic to intermediate knowledge of databases (e.g. SQL NoSQL)
- Understanding of unit testing and modern software engineering practices
- Experience with CI/CD pipelines and deployment automation
- Comfortable working in Agile development environments
Nice to Have:
- Exposure to monitoring/observability tools (e.g. Prometheus Grafana New Relic)
- Experience building internal tools for operational efficiency
- Participation in SRE culture: blameless postmortems runbooks and service design reviews
QualificationsCareer Level - IC4