drjobs Site Reliability Engineer

Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Orlando, FL - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Fulltime
Description

We are seeking an experienced and strategic Site Reliability Engineer (SRE) to drive the stability reliability and observability of our missioncritical systems. This role is crucial to ensuring high availability performance and operational excellence for our services. The SRE will be responsible for designing and implementing robust reliability frameworks overseeing system monitoring incident response and leading key initiatives to improve system performance.


This role requires a strong leadership mindset balancing proactive risk mitigation with rapid incident response. The ideal candidate will work closely with engineering operations and leadership teams to define and uphold servicelevel objectives (SLOs) and optimize system resilience.



Key Responsibilities & Objectives

  • Develop and enforce servicelevel indicators (SLIs) and objectives (SLOs) to measure and improve system health.
  • Implement and manage comprehensive observability strategies ensuring realtime visibility into system performance availability and health.
  • Oversee incident management and response processes ensuring quick mitigation of production issues and leading postmortem investigations to drive systemic improvements.
  • Optimize system reliability through failure analysis capacity planning and proactive risk assessment.
  • Define and implement best practices for oncall management reducing alert fatigue while ensuring critical issues are addressed efficiently.
  • Assist with writing RCAs by providing technical details of the incident
  • Continuously refine operational runbooks incident response plans and system reliability guidelines to enhance organizational resilience.
  • Analyze system performance trends production issues and historical outages to proactively address weaknesses before they impact customers.
  • Drive cultural change within the organization promoting a reliabilityfirst mindset across all teams.
Requirements
  • Bachelors degree in Computer Science Engineering or a related field.
  • 5 years of experience in a Site Reliability Engineering Production Engineering or Systems Engineering role.
  • Proven expertise in managing highavailability distributed systems in a production environment.
  • Deep understanding of observability practices including monitoring logging and tracing with tools such as Prometheus Grafana Datadog New Relic or OpenTelemetry.
  • Extensive experience in incident response RCAs postmortems and continuous improvement processes.
  • Strong background in capacity planning load balancing and performance tuning for largescale applications.
  • Experience with operational leadership oncall management and defining reliability strategies within complex environments.
  • Familiarity with networking security best practices and risk management strategies for distributed architectures.
  • Strong analytical and problemsolving skills to diagnose system failures and implement longterm solutions.


Preferred Skill Set

  • Incident Management & Alerting: Experience with Jira Service Management PagerDuty Opsgenie or equivalent tools.
  • Cloud Infrastructure Management: Handson expertise with AWS GCP or Azure.
  • Database Performance Optimization: Experience working with relational and NoSQL databases
  • Capacity Planning & Scalability Strategies: Ability to assess and predict infrastructure needs for growth.
  • Technical Leadership & Communication: Proven ability to work crossfunctionally and drive reliability initiatives at scale.

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.