DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new capabilities that provide cutting-edge AI user experience advancements and offer open APIs for customers and third parties to create innovative integrated solutions. As a Senior Site reliability Engineer you will play a pivotal role in designing deployment and optimizing Oracle Health applications. You will work in an innovative dynamic and collaborative team. If youre passionate about revolutionizing patient care and want to be at the forefront of healthcare technology join us to make a meaningful difference in global healthcare.
ResponsibilitiesResponsibilities
- Own end-to-end reliability and operational excellence across development testing and production environments by closely partnering with Development QA Security and Product teams to ensure seamless code integration validation and controlled promotion across environments using standardized release and promotion tooling.
- Design implement and continuously improve automated CI/CD and deployment pipelines identifying opportunities to eliminate manual toil through scripting tooling and Infrastructure-as-Code thereby reducing human error and accelerating safe repeatable releases.
- Establish and enforce observability best practices including comprehensive monitoring logging tracing and alerting to proactively detect anomalies prevent incidents and minimize customer impact while enabling rapid root cause analysis.
- Define measure and communicate service reliability goals including service scale capacity planning performance characteristics availability targets security posture and compliance requirements across the full technology stack.
- Apply automation and orchestration principles to manage complex distributed systems at scale and serve as the final escalation point for unresolved high-severity production issues not yet captured in Standard Operating Procedures (SOPs) driving permanent fixes and documentation.
- Leverage deep knowledge of service topology and inter-service dependencies to diagnose complex failures design mitigations and improve system resilience through fault isolation redundancy and graceful degradation strategies.
- Influence product and platform architecture decisions by clearly articulating their impact on reliability scalability latency availability and operational complexity in distributed systems.
- Define and manage SLIs SLOs and error budgets using data-driven insights to balance feature velocity with system reliability and guide engineering prioritization.
- Lead incident response and post-incident reviews driving blameless postmortems actionable remediation plans and long-term reliability improvements to reduce MTTR and recurring failures.
- Champion operational readiness and production standards ensuring services meet reliability security and observability requirements before launch.
- Mentor and guide engineers on SRE principles promoting a culture of ownership automation-first thinking and continuous improvement across Dev and Ops teams.
- Partner with Security and Compliance teams to ensure secure deployments secrets management access controls and audit readiness for enterprise and regulated environments.
Technical Skill
- 5 years of experience in infrastructure engineering or DevOps roles
- Proficiency in scripting languages such as Bash Python or PowerShell for automating tasks and managing infrastructure.
- Strong background on Linux
- Experience on Containerization Docker Kubernetes
- Hands-on experience with Kubernetes including deployment and management
- Familiarity with Helm for managing Kubernetes applications and deployments
- Familiarity with monitoring and logging technologies (e.g. Prometheus Grafana Splunk)
- Troubleshooting within Linux and Kubernetes environment during deployments.
- Deep knowledge of Networking (TCP UDP DNS DHCP IPSec)
- Experience with Terraform
- Hands on expertise on any cloud (AWS OCI Azure)
- Thorough understanding of DevOps culture and Agile Methodology.
- Ability to work effectively in a collaborative cross-functional team environment
QualificationsCareer Level - IC3
Required Experience:
Senior IC
DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized application...
DescriptionOur team is focused on modernizing the Electronic Health Record (EHR) to empower the front line of health care to work at the top of their license focus more on patients and less on the computer and achieve peak efficiency supported by the power of generative AI and modernized applications. Our approach to modernizing is to invest in new capabilities that provide cutting-edge AI user experience advancements and offer open APIs for customers and third parties to create innovative integrated solutions. As a Senior Site reliability Engineer you will play a pivotal role in designing deployment and optimizing Oracle Health applications. You will work in an innovative dynamic and collaborative team. If youre passionate about revolutionizing patient care and want to be at the forefront of healthcare technology join us to make a meaningful difference in global healthcare.
ResponsibilitiesResponsibilities
- Own end-to-end reliability and operational excellence across development testing and production environments by closely partnering with Development QA Security and Product teams to ensure seamless code integration validation and controlled promotion across environments using standardized release and promotion tooling.
- Design implement and continuously improve automated CI/CD and deployment pipelines identifying opportunities to eliminate manual toil through scripting tooling and Infrastructure-as-Code thereby reducing human error and accelerating safe repeatable releases.
- Establish and enforce observability best practices including comprehensive monitoring logging tracing and alerting to proactively detect anomalies prevent incidents and minimize customer impact while enabling rapid root cause analysis.
- Define measure and communicate service reliability goals including service scale capacity planning performance characteristics availability targets security posture and compliance requirements across the full technology stack.
- Apply automation and orchestration principles to manage complex distributed systems at scale and serve as the final escalation point for unresolved high-severity production issues not yet captured in Standard Operating Procedures (SOPs) driving permanent fixes and documentation.
- Leverage deep knowledge of service topology and inter-service dependencies to diagnose complex failures design mitigations and improve system resilience through fault isolation redundancy and graceful degradation strategies.
- Influence product and platform architecture decisions by clearly articulating their impact on reliability scalability latency availability and operational complexity in distributed systems.
- Define and manage SLIs SLOs and error budgets using data-driven insights to balance feature velocity with system reliability and guide engineering prioritization.
- Lead incident response and post-incident reviews driving blameless postmortems actionable remediation plans and long-term reliability improvements to reduce MTTR and recurring failures.
- Champion operational readiness and production standards ensuring services meet reliability security and observability requirements before launch.
- Mentor and guide engineers on SRE principles promoting a culture of ownership automation-first thinking and continuous improvement across Dev and Ops teams.
- Partner with Security and Compliance teams to ensure secure deployments secrets management access controls and audit readiness for enterprise and regulated environments.
Technical Skill
- 5 years of experience in infrastructure engineering or DevOps roles
- Proficiency in scripting languages such as Bash Python or PowerShell for automating tasks and managing infrastructure.
- Strong background on Linux
- Experience on Containerization Docker Kubernetes
- Hands-on experience with Kubernetes including deployment and management
- Familiarity with Helm for managing Kubernetes applications and deployments
- Familiarity with monitoring and logging technologies (e.g. Prometheus Grafana Splunk)
- Troubleshooting within Linux and Kubernetes environment during deployments.
- Deep knowledge of Networking (TCP UDP DNS DHCP IPSec)
- Experience with Terraform
- Hands on expertise on any cloud (AWS OCI Azure)
- Thorough understanding of DevOps culture and Agile Methodology.
- Ability to work effectively in a collaborative cross-functional team environment
QualificationsCareer Level - IC3
Required Experience:
Senior IC
View more
View less