Sr. Staff Site Reliability (SRE) and DevOps Engineer

Ariel Partners

Not Interested
Bookmark
Report This Job

profile Job Location:

New York City, NY - USA

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

We are seeking a Staff Site Reliability Engineer (SRE) to improve the reliability observability and operational health of our production platform. This role requires someone who can go beyond basic monitoringthe ideal candidate must understand application architecture and service dependencies in order to design meaningful alerts and actionable observability not just monitoring noise.
This position combines SRE DevOps and observability engineering with a strong focus on improving alert quality reducing operational fatigue and strengthening platform reliability.

Key Responsibilities
  • Optimize and clean up Datadog APM instrumentation monitors and dashboards to improve signal quality and reduce telemetry costs
  • Design intelligent alerting strategies to reduce PagerDuty alert fatigue
  • Develop monitoring that reflects real user impact and system health not infrastructure noise
  • Gain deep understanding of application architecture and service dependencies to diagnose failures and cascading impacts
  • Support DevOps and platform engineering efforts including automation and CI/CD improvements
  • Participate in on-call support during business hours (MonFri) and lead incident response improvements

Required Qualifications
  • 7 years of experience in Site Reliability Engineering DevOps or platform engineering
  • Strong hands-on experience with Datadog (APM monitoring dashboards alerting)
  • Experience designing actionable monitoring and intelligent alerting
  • Strong understanding of distributed systems and application architecture
  • Experience supporting production systems and incident response
  • Solid DevOps automation and infrastructure skills

Ideal Candidate
This role is best suited for an engineer who:
  • Understands applications deeply enough to create meaningful alerts
  • Can reduce monitoring noise and operational fatigue
  • Combines SRE reliability practices with strong DevOps engineering skills



If you are interested in getting more information about this opportunity please contact Irina Rozenbergat your earliest convenience.

At Ariel Partners we solve the most difficult problems that inhibit technology from enabling our customers to achieve their goals. Our vision is to be recognized by our stakeholders as an elite provider of IT solutions so when they have their biggest challenges we are on their short list. We are looking for team members who share our values of: Integrity to do the right thing even when it hurts; Commitment to the long-term success and happiness of our customers our people and our partners; Courage to take on difficult challenges accept new ideas and accept incremental failure; and the constant pursuit of Excellence. Ariel Partners is an Equal Opportunity Employer in accordance with federal state and local

Required Experience:

Staff IC

We are seeking a Staff Site Reliability Engineer (SRE) to improve the reliability observability and operational health of our production platform. This role requires someone who can go beyond basic monitoringthe ideal candidate must understand application architecture and service dependencies in ord...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Ariel Partners is an IT software consulting firm with experience executing some of the largest and most difficult technology projects.

View Profile View Profile