Senior Manager – Site Reliability Engineering (SRE)

Ohm Systems


Job Location:

Woonsocket, RI - USA

Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Job Description

Role Summary

We are seeking a Senior Manager of Site Reliability Engineering (SRE) to help drive the activation structure and scaling of SRE practices across the Financial Services & Innovation (FS&I) organization.

This role is responsible for establishing operational discipline driving adoption of SRE standards and aligning application teams Production Support Engineering (PSE) and platform teams to a consistent reliability model.

The ideal candidate brings a combination of technical depth organizational leadership and execution rigor with proven experience implementing SRE practices in complex enterprise environments.

Key Responsibilities

SRE Activation & Operating Model

  • Drive adoption of the SRE operating model across application teams
  • Establish clarity in roles between:
  • SRE
  • Production Support Engineering (PSE)
  • Application teams
  • Ensure SRE practices are embedded into the development lifecycle not treated as post-production activities

Reliability Standards & Governance

  • Define and enforce:
  • SLIs SLOs and Error Budgets
  • Production readiness criteria
  • Reliability best practices
  • Lead SLO adoption and compliance reviews across the organization
  • Establish governance frameworks to ensure consistent application of standards

Cross-Team Coordination & Enablement

  • Partner with:
  • Application product teams
  • Production Support Engineering (MG team)
  • Platform / Infrastructure / Observability teams
  • Drive alignment and reduce friction between engineering and operations
  • Ensure clear handoffs escalation models and operational ownership

Observability & Monitoring Strategy

  • Lead adoption of centralized observability standards across:
  • Metrics
  • Logging
  • Tracing
  • Align tooling (AppDynamics Splunk Prometheus etc.)
  • Ensure monitoring and alerting are SLO-driven and actionable not noise-based

Incident Management & Continuous Improvement

  • Partner with PSE to strengthen:
  • Incident management processes
  • RCA (Root Cause Analysis) standards
  • Drive identification of patterns and systemic issues
  • Ensure learnings translate into engineering improvements and automation

Automation & Reliability Engineering

  • Identify opportunities to:
  • Reduce manual operational work
  • Improve system resilience
  • Enable self-healing capabilities
  • Promote a culture of engineering over reaction

Reporting & Organizational Insight

  • Define and track reliability metrics across FS&I
  • Build reporting that provides visibility into:
  • System health
  • Incident trends
  • SLO performance
  • Translate technical data into actionable business insights

Required Qualifications

  • 10 years in engineering operations or SRE roles
  • 5 years leading SRE platform or reliability-focused teams
  • Proven experience implementing SRE practices at scale (SLIs SLOs error budgets)
  • Strong background in cloud environments (AWS Azure GCP)
  • Hands-on experience with observability tools (Splunk AppDynamics Prometheus etc.)
  • Experience in incident management and production operations at scale
  • Ability to operate effectively in high-pressure and complex enterprise environments

Preferred Qualifications

  • Experience driving organizational transformation (not just technical implementation)
  • Strong understanding of CI/CD DevOps and automation practices
  • Experience working in regulated or large enterprise environments
  • Familiarity with AIOps or advanced automation strategies

Key Success Indicators

  • Increased adoption of SLOs and reliability standards
  • Reduction in high-severity incidents over time
  • Improved MTTR and operational efficiency
  • Increased adoption of standardized observability practices
  • Reduction in reactive ticket-driven work across teams
  • Clear alignment between SRE PSE and application teams

Core Competencies

  • Strategic thinking with strong execution focus
  • Ability to drive alignment across multiple teams and stakeholders
  • Strong communication and influence skills
  • Bias toward structure clarity and accountability
  • Ability to operate with urgency and discipline in complex environments

Job Description Role Summary We are seeking a Senior Manager of Site Reliability Engineering (SRE) to help drive the activation structure and scaling of SRE practices across the Financial Services & Innovation (FS&I) organization. This role is responsible for establishing operational dis...