Site Reliability Engineer (SRE)

Zapopan - Mexico

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Description

Site Reliability Engineer

Job Description:

Are you passionate about solving complex distributed systems challenges at scale Join Oracle as a Site Reliability Engineer and help shape the reliability scalability and performance of Oracle Cloud Infrastructure (OCI). As part of the Site Reliability Engineering (SRE) team youll contribute to designing automating and evolving mission-critical systems. Youll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient self-healing services.

This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.

What Were Looking For:

Advanced Linux systems administration
Strong coding skills in Python (automation-focused)
Intermediate experience with Bash/Shell scripting
Familiarity with networking principles and distributed systems behavior
Basic to intermediate knowledge of databases (e.g. SQL NoSQL)
Understanding of unit testing and modern software engineering practices
Experience with CI/CD pipelines and deployment automation
Comfortable working in Agile development environments

Nice to Have:

Exposure to monitoring/observability tools (e.g. Prometheus Grafana New Relic)
Experience building internal tools for operational efficiency
Participation in SRE culture: blameless postmortems runbooks and service design reviews

Responsibilities

What Youll Do:

Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
Design write and deploy software and automation tools that enhance availability observability and scalability.
Own and evolve metrics SLOs SLAs KPIs and dashboards that track system health and customer experience.
Build tooling to reduce manual operations and eliminate sources of toil.
Improve CI/CD pipelines deployment processes and validation frameworks for reliability and efficiency.
Review and influence architectural designs for distributed systems with a focus on resilience performance and fault tolerance.
Lead and participate in post-incident reviews capacity planning and production-readiness assessments.
Provide on-call support on a rotational basis (12-hour shifts 7-day coverage).

Qualifications

Career Level - IC3

DescriptionSite Reliability EngineerJob Description:Are you passionate about solving complex distributed systems challenges at scale Join Oracle as a Site Reliability Engineer and help shape the reliability scalability and performance of Oracle Cloud Infrastructure (OCI). As part of the Site Reliabi...

Description

Site Reliability Engineer

Job Description:

This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.

What Were Looking For:

Advanced Linux systems administration
Strong coding skills in Python (automation-focused)
Intermediate experience with Bash/Shell scripting
Familiarity with networking principles and distributed systems behavior
Basic to intermediate knowledge of databases (e.g. SQL NoSQL)
Understanding of unit testing and modern software engineering practices
Experience with CI/CD pipelines and deployment automation
Comfortable working in Agile development environments

Nice to Have:

Exposure to monitoring/observability tools (e.g. Prometheus Grafana New Relic)
Experience building internal tools for operational efficiency
Participation in SRE culture: blameless postmortems runbooks and service design reviews

Responsibilities

What Youll Do:

Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
Design write and deploy software and automation tools that enhance availability observability and scalability.
Own and evolve metrics SLOs SLAs KPIs and dashboards that track system health and customer experience.
Build tooling to reduce manual operations and eliminate sources of toil.
Improve CI/CD pipelines deployment processes and validation frameworks for reliability and efficiency.
Review and influence architectural designs for distributed systems with a focus on resilience performance and fault tolerance.
Lead and participate in post-incident reviews capacity planning and production-readiness assessments.
Provide on-call support on a rotational basis (12-hour shifts 7-day coverage).

Qualifications

Career Level - IC3

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Oracle

Oracle provides the world's most complete, open, and integrated business software and hardware systems, with more than 370,000 customers—including 100 of the Fortune 100—representing a variety of sizes and industries in more than 145 countries around the globe. And Oracle's 110,000 gl ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click