Senior Site Reliability Engineer

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 10 hours ago

Vacancies: 1 Vacancy

Job Summary

Description

Our team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new capabilities that provide cutting-edge AI user experience advancements and offer open APIs for customers and third parties to create innovative integrated solutions. As a Senior Site reliability Engineer you will play a pivotal role in designing deployment and optimizing Oracle Health applications. You will work in an innovative dynamic and collaborative team. If youre passionate about revolutionizing patient care and want to be at the forefront of healthcare technology join us to make a meaningful difference in global healthcare.

Responsibilities

Own end-to-end reliability and operational excellence across development testing and production environments by closely partnering with Development QA Security and Product teams to ensure seamless code integration validation and controlled promotion across environments using standardized release and promotion tooling.
Design implement and continuously improve automated CI/CD and deployment pipelines identifying opportunities to eliminate manual toil through scripting tooling and Infrastructure-as-Code thereby reducing human error and accelerating safe repeatable releases.
Establish and enforce observability best practices including comprehensive monitoring logging tracing and alerting to proactively detect anomalies prevent incidents and minimize customer impact while enabling rapid root cause analysis.
Define measure and communicate service reliability goals including service scale capacity planning performance characteristics availability targets security posture and compliance requirements across the full technology stack.
Apply automation and orchestration principles to manage complex distributed systems at scale and serve as the final escalation point for unresolved high-severity production issues not yet captured in Standard Operating Procedures (SOPs) driving permanent fixes and documentation.
Leverage deep knowledge of service topology and inter-service dependencies to diagnose complex failures design mitigations and improve system resilience through fault isolation redundancy and graceful degradation strategies.
Influence product and platform architecture decisions by clearly articulating their impact on reliability scalability latency availability and operational complexity in distributed systems.
Define and manage SLIs SLOs and error budgets using data-driven insights to balance feature velocity with system reliability and guide engineering prioritization.
Lead incident response and post-incident reviews driving blameless postmortems actionable remediation plans and long-term reliability improvements to reduce MTTR and recurring failures.
Champion operational readiness and production standards ensuring services meet reliability security and observability requirements before launch.
Mentor and guide engineers on SRE principles promoting a culture of ownership automation-first thinking and continuous improvement across Dev and Ops teams.
Partner with Security and Compliance teams to ensure secure deployments secrets management access controls and audit readiness for enterprise and regulated environments.

Technical Skill

5 years of experience in infrastructure engineering or DevOps roles
Proficiency in scripting languages such as Bash Python or PowerShell for automating tasks and managing infrastructure.
Strong background on Linux
Experience on Containerization Docker Kubernetes
Hands-on experience with Kubernetes including deployment and management
Familiarity with Helm for managing Kubernetes applications and deployments
Familiarity with monitoring and logging technologies (e.g. Prometheus Grafana Splunk)
Troubleshooting within Linux and Kubernetes environment during deployments.
Deep knowledge of Networking (TCP UDP DNS DHCP IPSec)
Experience with Terraform
Hands on expertise on any cloud (AWS OCI Azure)
Thorough understanding of DevOps culture and Agile Methodology.
Ability to work effectively in a collaborative cross-functional team environment

Qualifications

Career Level - IC3

Required Experience:

Senior IC

DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized application...

Description

Responsibilities

Own end-to-end reliability and operational excellence across development testing and production environments by closely partnering with Development QA Security and Product teams to ensure seamless code integration validation and controlled promotion across environments using standardized release and promotion tooling.
Design implement and continuously improve automated CI/CD and deployment pipelines identifying opportunities to eliminate manual toil through scripting tooling and Infrastructure-as-Code thereby reducing human error and accelerating safe repeatable releases.
Establish and enforce observability best practices including comprehensive monitoring logging tracing and alerting to proactively detect anomalies prevent incidents and minimize customer impact while enabling rapid root cause analysis.
Define measure and communicate service reliability goals including service scale capacity planning performance characteristics availability targets security posture and compliance requirements across the full technology stack.
Apply automation and orchestration principles to manage complex distributed systems at scale and serve as the final escalation point for unresolved high-severity production issues not yet captured in Standard Operating Procedures (SOPs) driving permanent fixes and documentation.
Leverage deep knowledge of service topology and inter-service dependencies to diagnose complex failures design mitigations and improve system resilience through fault isolation redundancy and graceful degradation strategies.
Influence product and platform architecture decisions by clearly articulating their impact on reliability scalability latency availability and operational complexity in distributed systems.
Define and manage SLIs SLOs and error budgets using data-driven insights to balance feature velocity with system reliability and guide engineering prioritization.
Lead incident response and post-incident reviews driving blameless postmortems actionable remediation plans and long-term reliability improvements to reduce MTTR and recurring failures.
Champion operational readiness and production standards ensuring services meet reliability security and observability requirements before launch.
Mentor and guide engineers on SRE principles promoting a culture of ownership automation-first thinking and continuous improvement across Dev and Ops teams.
Partner with Security and Compliance teams to ensure secure deployments secrets management access controls and audit readiness for enterprise and regulated environments.

Technical Skill

5 years of experience in infrastructure engineering or DevOps roles
Proficiency in scripting languages such as Bash Python or PowerShell for automating tasks and managing infrastructure.
Strong background on Linux
Experience on Containerization Docker Kubernetes
Hands-on experience with Kubernetes including deployment and management
Familiarity with Helm for managing Kubernetes applications and deployments
Familiarity with monitoring and logging technologies (e.g. Prometheus Grafana Splunk)
Troubleshooting within Linux and Kubernetes environment during deployments.
Deep knowledge of Networking (TCP UDP DNS DHCP IPSec)
Experience with Terraform
Hands on expertise on any cloud (AWS OCI Azure)
Thorough understanding of DevOps culture and Agile Methodology.
Ability to work effectively in a collaborative cross-functional team environment

Qualifications

Career Level - IC3

Required Experience:

Senior IC

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Oracle

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click