Key Responsibilities:Ensure high availability reliability performance and scalability of services in and implement monitoring alerting and observability systems to detect and remediate incidents repetitive tasks to reduce toil and improve developer and operational infrastructureascode (IaC) using tools like Terraform Ansible or in oncall rotation incident response and postmortem analysis to prevent with development teams to ensure smooth deployment and operation of CI/CD pipelines and drive improvements in release and measure SLOs SLIs and error budgets to manage service health and security best practices in infrastructure and Skills & Qualifications:3 TO 5 years of experience in SRE DevOps or Systems Engineering experience with cloud platforms such as AWS Azure or with containers and orchestration (Docker Kubernetes Helm).Solid scripting/coding skills in languages like Python Bash Go or with observability tools (e.g. Prometheus Grafana Datadog ELK Splunk).Deep understanding of Linux/Unix systems and networking with CI/CD tools such as Jenkins GitLab CI CircleCI or of distributed systems microservices architecture and REST APIs.