Senior Site Reliability Engineer

Oracle

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 10 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

Our team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new capabilities that provide cutting-edge AI user experience advancements and offer open APIs for customers and third parties to create innovative integrated solutions. As a Senior Site reliability Engineer you will play a pivotal role in designing deployment and optimizing Oracle Health applications. You will work in an innovative dynamic and collaborative team. If youre passionate about revolutionizing patient care and want to be at the forefront of healthcare technology join us to make a meaningful difference in global healthcare.



Responsibilities

Responsibilities

  • Own end-to-end reliability and operational excellence across development testing and production environments by closely partnering with Development QA Security and Product teams to ensure seamless code integration validation and controlled promotion across environments using standardized release and promotion tooling.
  • Design implement and continuously improve automated CI/CD and deployment pipelines identifying opportunities to eliminate manual toil through scripting tooling and Infrastructure-as-Code thereby reducing human error and accelerating safe repeatable releases.
  • Establish and enforce observability best practices including comprehensive monitoring logging tracing and alerting to proactively detect anomalies prevent incidents and minimize customer impact while enabling rapid root cause analysis.
  • Define measure and communicate service reliability goals including service scale capacity planning performance characteristics availability targets security posture and compliance requirements across the full technology stack.
  • Apply automation and orchestration principles to manage complex distributed systems at scale and serve as the final escalation point for unresolved high-severity production issues not yet captured in Standard Operating Procedures (SOPs) driving permanent fixes and documentation.
  • Leverage deep knowledge of service topology and inter-service dependencies to diagnose complex failures design mitigations and improve system resilience through fault isolation redundancy and graceful degradation strategies.
  • Influence product and platform architecture decisions by clearly articulating their impact on reliability scalability latency availability and operational complexity in distributed systems.
  • Define and manage SLIs SLOs and error budgets using data-driven insights to balance feature velocity with system reliability and guide engineering prioritization.
  • Lead incident response and post-incident reviews driving blameless postmortems actionable remediation plans and long-term reliability improvements to reduce MTTR and recurring failures.
  • Champion operational readiness and production standards ensuring services meet reliability security and observability requirements before launch.
  • Mentor and guide engineers on SRE principles promoting a culture of ownership automation-first thinking and continuous improvement across Dev and Ops teams.
  • Partner with Security and Compliance teams to ensure secure deployments secrets management access controls and audit readiness for enterprise and regulated environments.

Technical Skill

  • 5 years of experience in infrastructure engineering or DevOps roles
  • Proficiency in scripting languages such as Bash Python or PowerShell for automating tasks and managing infrastructure.
  • Strong background on Linux
  • Experience on Containerization Docker Kubernetes
  • Hands-on experience with Kubernetes including deployment and management
  • Familiarity with Helm for managing Kubernetes applications and deployments
  • Familiarity with monitoring and logging technologies (e.g. Prometheus Grafana Splunk)
  • Troubleshooting within Linux and Kubernetes environment during deployments.
  • Deep knowledge of Networking (TCP UDP DNS DHCP IPSec)
  • Experience with Terraform
  • Hands on expertise on any cloud (AWS OCI Azure)
  • Thorough understanding of DevOps culture and Agile Methodology.
  • Ability to work effectively in a collaborative cross-functional team environment


Qualifications

Career Level - IC3




Required Experience:

Senior IC

DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized application...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more

View Profile View Profile