SRE Engineer

V R Della Infotech Inc

Job Location:

San Diego, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Description

We re looking for a Site Reliability Engineer (SRE) to join our Global SRE this role you ll blend software engineering and systems engineering to help ensure our large-scale distributed digital products are reliable scalable and efficient. You ll work closely with software platform and product teams to design build and operate systems that support Resmed s customers worldwide.

Responsibilities

Ensure the reliability availability and resiliency of Resmed s digital products by designing and operating fault-tolerant systems
Partner with product and platform teams to define and improve service health using operational and customer-experience metrics
Design implement and maintain monitoring alerting logging and tracing solutions that provide real-time visibility into system behavior and customer experience
Analyze system performance scalability and capacity and drive optimizations to improve efficiency and stability in cloud environments
Build automation and tooling to support deployments scaling incident response and operational workflows
Participate in an on-call rotation as part of a globally distributed team lead incident response efforts troubleshoot production issues conduct postmortems and drive continuous improvement initiatives
Collaborate with security and compliance partners to support secure privacy-aware and compliant operations
Work closely with engineering teams to improve developer experience operational maturity and overall customer experience

Qualifications

Experience in Site Reliability Engineering DevOps or Infrastructure Engineering roles
Experience operating Kubernetes-based production systems
Hands-on experience with AWS and infrastructure-as-code tools
Experience designing and supporting CI/CD pipelines and automated deployments
Proficiency in Python for automation tooling or backend services
Solid understanding of distributed systems and networking concepts
Experience with monitoring and observability platforms such as Datadog and CloudWatch

Job Description Job Description We re looking for a Site Reliability Engineer (SRE) to join our Global SRE this role you ll blend software engineering and systems engineering to help ensure our large-scale distributed digital products are reliable scalable and efficient. You ll work...

Job Description

Responsibilities

Ensure the reliability availability and resiliency of Resmed s digital products by designing and operating fault-tolerant systems
Partner with product and platform teams to define and improve service health using operational and customer-experience metrics
Design implement and maintain monitoring alerting logging and tracing solutions that provide real-time visibility into system behavior and customer experience
Analyze system performance scalability and capacity and drive optimizations to improve efficiency and stability in cloud environments
Build automation and tooling to support deployments scaling incident response and operational workflows
Participate in an on-call rotation as part of a globally distributed team lead incident response efforts troubleshoot production issues conduct postmortems and drive continuous improvement initiatives
Collaborate with security and compliance partners to support secure privacy-aware and compliant operations
Work closely with engineering teams to improve developer experience operational maturity and overall customer experience

Qualifications

Experience in Site Reliability Engineering DevOps or Infrastructure Engineering roles
Experience operating Kubernetes-based production systems
Hands-on experience with AWS and infrastructure-as-code tools
Experience designing and supporting CI/CD pipelines and automated deployments
Proficiency in Python for automation tooling or backend services
Solid understanding of distributed systems and networking concepts
Experience with monitoring and observability platforms such as Datadog and CloudWatch