Senior Site Reliability Engineer

Zeta Global

Not Interested
Bookmark
Report This Job

profile Job Location:

Atlanta, GA - USA

profile Monthly Salary: $ 140000 - 170000
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

WHO WE ARE

Zeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire grow and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP) our vision is to make sophisticated marketing simple by unifying identity intelligence and omnichannel activation into a single platform powered by one of the industrys largest proprietary databases and AI. Our enterprise customers across multiple verticals are empowered to personalize experiences with consumers at an individual level across every channel delivering better results for marketing programs. Zeta was founded in 2007 by David A. Steinberg and John Sculley and is headquartered in New York City with offices around the world. To learn more go to .

The Role

Were looking for an experienced Senior Site Reliability Engineer (SRE) who can write production-grade code have mastery of SLIs SLOs and error budgets and are passionate about building scalable observability systems.

If you:

  • Can code confidently in Python or Golang and solve real-world problems through automation. (not only scripting)
  • Have hands-on experience implementing SLIs SLOs and distributed tracing in production.
  • Understand Kubernetes Terraform and Infrastructure as Code tools.
  • Have hands-on experience with Chaos Engineeringand anomaly detection.
  • Are excited about working with high-throughput distributed systems processing millions of transactions daily

Then this role might be for you!

Key Responsibilities:

  • Design implement and manage SLOs SLIs and error budgets ensuring reliability aligns with user expectations and business objectives.
  • Develop production-grade software to enhance system reliability and reduce manual toil through automation.
  • Implement and optimize observabilitysolutionsusing tools like OpenTelemetry with a focuson high-cardinality metrics distributed tracing and actionable insights.
  • Drive postmortem processes and lead in-depth root cause analyses for incidents ensuring lessons learned are effectively applied to prevent recurrence.
  • Define and monitor MTTx metrics (MTTA MTTR MTTF) using them to guide system improvements and measure reliability progress.
  • Design and participate in Chaos Engineering exercises.
  • Collaborate with engineering teams to design systems with reliability and scalability in mind incorporating capacity planning resiliency patterns and modern deployment strategies (e.g. Canary Blue-Green).
  • Lead design reviews for alerting strategies ensuring effective signal-to-noise ratios in monitoring and incident management.
  • Advocate for and implement best practices in incident response and system design to achieveoptimaluptime and performance.

Your experience:

Strong Coding Background:

  • 4 years of experience as an SRE or in a similar role with hands-on coding.
  • 3 years of software development experience in Python or Golang with a focus on building maintainable production-quality code.

SRE Expertise:

  • Deep understanding of SRE principles particularly SLIs SLOs error budgets and their real-world application.
  • Hands-on experience conducting postmortems and implementing observability at scale.
  • Hands-on experience conducting chaos engineering exercises.

Observability Skills:

  • Expertise in designing and implementing end-to-end observabilitysolutions using tools like OpenTelemetry Prometheus Grafana or Honeycomb.
  • Experience with distributed tracing and handling high-cardinality metrics in production environments.

Infrastructure Knowledge:

  • 3 years of experience with AWS and proficiency in Kubernetes Terraform andInfrastructure as Code (IaC) tools.
  • Strong understanding of distributed systems microservices architectures and containerization (Docker Kubernetes).

Monitoring and Automation:

  • Hands-on experience with CI/CD platforms (GitOps Jenkins ArgoCD) and building automated pipelines.
  • Familiarity with tools and frameworks for incident management and operational automation.

Additional Skills:

  • Knowledge of modern deployment strategies (e.g. CanaryBlue-Green) and resiliency patterns (e.g. circuit breakers retries).
  • Strong analytical skills for statistical analysis of metrics to identify and resolve performance bottlenecks.

BENEFITS & PERKS

  • Unlimited PTO
  • Excellent medical dental and vision coverage
  • Employee Equity and Stock Purchase Plan
  • Employee Discounts Virtual Wellness Classes and Pet Insurance And more!!


COMPENSATION RANGE

The compensation range for this role is $140000.00 - $170000.00 depending on location and experience.

PEOPLE & CULTURE AT ZETA

Zeta considers applicants for employment without regard to and does not discriminate on the basis of an individuals sex race color religion age disability status as a veteran or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation gender identity or expression.

Werecommitted to building a workplace culture of trust and belonging so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate support and advocate for one another. Learn more about our commitment to diversityequityand inclusion here: IN THE NEWS!

Experience:

Senior IC

WHO WE AREZeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire grow and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP) our vision is t...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Zeta Global empowers businesses with cutting-edge data-driven marketing solutions. Harness the potential of AI-driven insights and customer engagement.

View Profile View Profile