Senior Manager – Site Reliability Engineering (SRE)

Ohm Systems

Job Location:

Woonsocket, RI - USA

Monthly Salary: Not Disclosed

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Job Description

Role Summary

We are seeking a Senior Manager of Site Reliability Engineering (SRE) to help drive the activation structure and scaling of SRE practices across the Financial Services & Innovation (FS&I) organization.

This role is responsible for establishing operational discipline driving adoption of SRE standards and aligning application teams Production Support Engineering (PSE) and platform teams to a consistent reliability model.

The ideal candidate brings a combination of technical depth organizational leadership and execution rigor with proven experience implementing SRE practices in complex enterprise environments.

Key Responsibilities

SRE Activation & Operating Model

Drive adoption of the SRE operating model across application teams

Establish clarity in roles between:

Production Support Engineering (PSE)

Application teams

Ensure SRE practices are embedded into the development lifecycle not treated as post-production activities

Reliability Standards & Governance

Define and enforce:

SLIs SLOs and Error Budgets

Production readiness criteria

Reliability best practices

Lead SLO adoption and compliance reviews across the organization

Establish governance frameworks to ensure consistent application of standards

Cross-Team Coordination & Enablement

Partner with:

Application product teams

Production Support Engineering (MG team)

Platform / Infrastructure / Observability teams

Drive alignment and reduce friction between engineering and operations

Ensure clear handoffs escalation models and operational ownership

Observability & Monitoring Strategy

Lead adoption of centralized observability standards across:

Metrics

Logging

Tracing

Align tooling (AppDynamics Splunk Prometheus etc.)

Ensure monitoring and alerting are SLO-driven and actionable not noise-based

Incident Management & Continuous Improvement

Partner with PSE to strengthen:

Incident management processes

RCA (Root Cause Analysis) standards

Drive identification of patterns and systemic issues

Ensure learnings translate into engineering improvements and automation

Automation & Reliability Engineering

Identify opportunities to:

Reduce manual operational work

Improve system resilience

Enable self-healing capabilities

Promote a culture of engineering over reaction

Reporting & Organizational Insight

Define and track reliability metrics across FS&I

Build reporting that provides visibility into:

System health

Incident trends

SLO performance

Translate technical data into actionable business insights

Required Qualifications

10 years in engineering operations or SRE roles

5 years leading SRE platform or reliability-focused teams

Proven experience implementing SRE practices at scale (SLIs SLOs error budgets)

Strong background in cloud environments (AWS Azure GCP)

Hands-on experience with observability tools (Splunk AppDynamics Prometheus etc.)

Experience in incident management and production operations at scale

Ability to operate effectively in high-pressure and complex enterprise environments

Preferred Qualifications

Experience driving organizational transformation (not just technical implementation)

Strong understanding of CI/CD DevOps and automation practices

Experience working in regulated or large enterprise environments

Familiarity with AIOps or advanced automation strategies

Key Success Indicators

Increased adoption of SLOs and reliability standards

Reduction in high-severity incidents over time

Improved MTTR and operational efficiency

Increased adoption of standardized observability practices

Reduction in reactive ticket-driven work across teams

Clear alignment between SRE PSE and application teams

Core Competencies

Strategic thinking with strong execution focus

Ability to drive alignment across multiple teams and stakeholders

Strong communication and influence skills

Bias toward structure clarity and accountability

Ability to operate with urgency and discipline in complex environments

Job Description Role Summary We are seeking a Senior Manager of Site Reliability Engineering (SRE) to help drive the activation structure and scaling of SRE practices across the Financial Services & Innovation (FS&I) organization. This role is responsible for establishing operational dis...