Job Description
Job Description
We re looking for a Site Reliability Engineer (SRE) to join our Global SRE this role you ll blend software engineering and systems engineering to help ensure our large-scale distributed digital products are reliable scalable and efficient. You ll work closely with software platform and product teams to design build and operate systems that support Resmed s customers worldwide.
Responsibilities
-
Ensure the reliability availability and resiliency of Resmed s digital products by designing and operating fault-tolerant systems
-
Partner with product and platform teams to define and improve service health using operational and customer-experience metrics
-
Design implement and maintain monitoring alerting logging and tracing solutions that provide real-time visibility into system behavior and customer experience
-
Analyze system performance scalability and capacity and drive optimizations to improve efficiency and stability in cloud environments
-
Build automation and tooling to support deployments scaling incident response and operational workflows
-
Participate in an on-call rotation as part of a globally distributed team lead incident response efforts troubleshoot production issues conduct postmortems and drive continuous improvement initiatives
-
Collaborate with security and compliance partners to support secure privacy-aware and compliant operations
-
Work closely with engineering teams to improve developer experience operational maturity and overall customer experience
Qualifications
-
Experience in Site Reliability Engineering DevOps or Infrastructure Engineering roles
-
Experience operating Kubernetes-based production systems
-
Hands-on experience with AWS and infrastructure-as-code tools
-
Experience designing and supporting CI/CD pipelines and automated deployments
-
Proficiency in Python for automation tooling or backend services
-
Solid understanding of distributed systems and networking concepts
-
Experience with monitoring and observability platforms such as Datadog and CloudWatch
Job Description Job Description We re looking for a Site Reliability Engineer (SRE) to join our Global SRE this role you ll blend software engineering and systems engineering to help ensure our large-scale distributed digital products are reliable scalable and efficient. You ll work...
Job Description
Job Description
We re looking for a Site Reliability Engineer (SRE) to join our Global SRE this role you ll blend software engineering and systems engineering to help ensure our large-scale distributed digital products are reliable scalable and efficient. You ll work closely with software platform and product teams to design build and operate systems that support Resmed s customers worldwide.
Responsibilities
-
Ensure the reliability availability and resiliency of Resmed s digital products by designing and operating fault-tolerant systems
-
Partner with product and platform teams to define and improve service health using operational and customer-experience metrics
-
Design implement and maintain monitoring alerting logging and tracing solutions that provide real-time visibility into system behavior and customer experience
-
Analyze system performance scalability and capacity and drive optimizations to improve efficiency and stability in cloud environments
-
Build automation and tooling to support deployments scaling incident response and operational workflows
-
Participate in an on-call rotation as part of a globally distributed team lead incident response efforts troubleshoot production issues conduct postmortems and drive continuous improvement initiatives
-
Collaborate with security and compliance partners to support secure privacy-aware and compliant operations
-
Work closely with engineering teams to improve developer experience operational maturity and overall customer experience
Qualifications
-
Experience in Site Reliability Engineering DevOps or Infrastructure Engineering roles
-
Experience operating Kubernetes-based production systems
-
Hands-on experience with AWS and infrastructure-as-code tools
-
Experience designing and supporting CI/CD pipelines and automated deployments
-
Proficiency in Python for automation tooling or backend services
-
Solid understanding of distributed systems and networking concepts
-
Experience with monitoring and observability platforms such as Datadog and CloudWatch
View more
View less