Platform Operations Engineer (Site Reliability Engineer)

Vertiv Group

Job Location:

Westerville, OH - USA

Monthly Salary: Not Disclosed

Posted on: 7 days ago

Vacancies: 1 Vacancy

Job Summary

Description

Job Summary

Vertiv is seeking a skilled Platform Operations Engineer (Site Reliability Engineer) to serve as the owner of cross-platform observability incident management and operational reliability within Vertivs Digital organization. This individual contributor role is responsible for designing implementing and continuously improving monitoring and alerting solutions across Vertivs digital platform ecosystem including Compass AI Writer AI Site Scope UiPath Workato Cursor and other approved enterprise tools while owning incident response processes SLA management and operational governance. The Platform Operations / SRE will operate within the Digital organization and play a central role in advancing Vertivs Operational Excellence strategic priority by ensuring the availability performance and resilience of platforms that power critical digital workflows and business functions.

As an individual contributor in a lead capacity this role includes proactive reliability engineering applying SRE principles such as SLOs error budgets and blameless post-mortems and embedding secure coding and operational governance practices across the Digital organization. The Platform Operations / SRE Engineer will define and enforce observability standards lead incident response and root cause analysis manage platform-level SLAs and partner with engineering security and business stakeholders to ensure that all digital platforms meet agreed availability and performance targets.

This position partners closely with IT Security NPDI Digital delivery teams and business operations and is based on site at Vertivs Westerville OH headquarters.

Responsibilities

Own Cross-Platform Monitoring & Observability:Design implement and maintain end-to-end monitoring alerting and observability solutions across Vertivs digital platform ecosystem including AI platforms automation tools and internal applications ensuring real-time visibility into system health performance and availability.
Lead Incident Response & Management:Serve as the primary escalation point and incident commander for P1/P2 incidents across Digital platforms; lead root cause analysis (RCA) blameless post-mortems and corrective action tracking to prevent recurrence and reduce mean time to resolution (MTTR).
Manage Platform SLAs & Reliability Targets:Define instrument and enforce service level objectives (SLOs) service level indicators (SLIs) and error budgets across Digital platforms; produce regular SLA performance reports for leadership and drive platform improvements to meet or exceed agreed availability and performance targets.
Drive Secure Coding & Operational Governance:Champion secure coding practices and DevSecOps standards within Digital delivery teams; conduct operational readiness reviews for new platform deployments enforce configuration management and change control processes and partner with IT Security and NPDI to ensure all platforms meet Vertivs security and compliance requirements.
Automate Operations & Reduce Toil:Identify and eliminate manual operational toil through automation. This includes automated remediation runbooks and anomaly detection through the use of scripting IaC tools and approved automation platforms.
Capacity Planning & Performance Engineering:Analyze platform utilization trends and conduct capacity planning across Digital environments; proactively identify performance bottlenecks and recommend architectural improvements to ensure platforms scale reliably with business demand.
CI/CD Pipeline Reliability & Deployment Support: Partner with Digital delivery teams to ensure CI/CD pipelines are instrumented for reliability deployment risk is managed through progressive rollout strategies and production deployments are supported with appropriate rollback and health-check capabilities.
Evaluate & Advance Observability Tooling:Stay current on advancements in observability AIOps and SRE tooling; evaluate and recommend new tools and practices that enhance Vertivs platform operations maturity and drive adoption of modern reliability engineering standards across the Digital organization.

Requirements

Bachelors degree in Computer Science Information Systems Engineering or a related field; equivalent practical experience considered.
5 years of professional experience in platform operations site reliability engineering DevOps or a related software/infrastructure engineering discipline.
3 years of hands-on experience with enterprise monitoring and observability platforms (e.g. Datadog Grafana Prometheus Azure Monitor Splunk or equivalent) in a multi-platform environment.
Demonstrated experience owning and managing incident response processes post-mortem facilitation and SLA/SLO governance.
Experience implementing secure coding practices DevSecOps standards or operational governance frameworks in an enterprise software delivery environment.

Technical Skills

Proficiency with monitoring and observability tools (Datadog Grafana Prometheus Azure Monitor Splunk or equivalent) for cross-platform health and performance tracking.
Strong knowledge of SRE principles including SLOs SLIs blameless post-mortems and toil reduction practices.
Hands-on experience with cloud platforms (AWS preferred) and familiarity with containerized environments (Docker Kubernetes) and infrastructure-as-code tooling (Terraform Ansible or equivalent).
Proficiency in at multiple programming languages (Python Ruby Powershell Java Javascript C# etc.) for automation and runbook development.
Experience with CI/CD platforms (GitLab Jenkins GitHub Actions Azure DevOps or equivalent) and deployment reliability practices including progressive rollout feature flags and automated health checks.

Preferred Qualifications

Google SRE certification AWS DevOps Professional Azure certifications or equivalent SRE/cloud operations certification.
Experience with AIOps tooling or AI-assisted anomaly detection and automated remediation capabilities.
Familiarity with the Vertiv digital platform ecosystem: Workato UiPath Power Automate Compass AI Writer AI or Cursor.
Experience applying DevSecOps practices including SAST/DAST scanning secrets management and compliance-as-code in enterprise environments.
Experience working in Agile/Scrum delivery environments; familiarity with ITIL incident and change management frameworks.

The successful candidate will embrace Vertivs Core Principals & Behaviors to help execute our Strategic Priorities.

OUR CORE PRINCIPALS: Safety. Integrity. Respect. Teamwork. Diversity & Inclusion.

OUR STRATEGIC PRIORITIES

Customer Focus

Operational Excellence

High-Performance Culture

Innovation

Financial Strength

OUR BEHAVIORS

Own It

Act With Urgency

Foster a Customer-First Mindset

Think Big and Execute

Lead by Example

Drive Continuous Improvement

Learn and Seek Out Development

About Vertiv

Vertiv is a $10.2billion global critical infrastructure and data center technology company. We ensure customers vital applications run continuously by bringing together hardware software analytics and ongoing services. Our portfolio includes power cooling and IT infrastructure solutions and services that extends from the cloud to the edge of the network. Headquartered in Columbus Ohio USA Vertiv employs around 20000 people and does business in more than 130 countries. Visit to learn more.

Work Authorization

No calls or agencies please. Vertiv will only employ those who are legally authorized to work in the United States. This is not a position for which sponsorship will be provided. Individuals with temporary visas such as E F-1 H-1 H-2 L B J or TN or who need sponsorship for work authorization now or in the future are not eligible for hire.

Equal Opportunity Employer

Vertiv is an Equal Opportunity/Affirmative Action employer. We promote equal opportunities for all with respect to hiring terms of employment mobility training compensation and occupational health without discrimination as to age race color religion creed sex pregnancy status (including childbirth breastfeeding or related medical conditions) marital status sexual orientation gender identity / expression (including transgender status or sexual stereotypes) genetic information citizenship status national origin protected veteran status political affiliation or disability. If you have a disability and are having difficulty accessing or using this website to apply for a position you can request help by sending an email to

#LI-RB1

Required Experience:

DescriptionJob SummaryVertiv is seeking a skilled Platform Operations Engineer (Site Reliability Engineer) to serve as the owner of cross-platform observability incident management and operational reliability within Vertivs Digital organization. This individual contributor role is responsible for de...