Senior Observability Platform Engineer

Klaviyo

Not Interested
Bookmark
Report This Job

profile Job Location:

Boston, NH - USA

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

At Klaviyo Platform Engineering is what you get when you treat operating complex systems as a software engineering problem. Our Observability Platform group applies that philosophy to how we collect store and surface signals about the health of our products and infrastructure. We build and run the shared observability stackmetrics logs traces alerting and developer-facing toolingthat enables every product and platform team at Klaviyo to understand how their systems behave in production and to ship changes with confidence.

As a Senior Observability Platform Engineer you will design build and operate the core observability services that power Klaviyos monitoring and incident response. Youll partner closely with product engineering other platform teams and security to define how we instrument services standardize telemetry and make it easy for engineers to debug issues in a fast-growing distributed environment.

How youll make an impact

  • Own observability platforms end-to-end Design implement and operate scalable highly available systems for metrics logging tracing and alerting (e.g. Prometheus-compatible metrics timeseries storage log pipelines distributed tracing backends).
  • Build opinionated developer experiences Create libraries dashboards runbooks and self-service tooling that make doing the right thing for observability the easiest path for Klaviyo engineers.
  • Set standards for telemetry Define and evangelize best practices for instrumentation SLOs alerting and incident readiness across services and teams.
  • Drive reliability through data Use observability data to identify performance bottlenecks reliability risks and architectural improvements and collaborate with teams to address them.
  • Automate everything Treat infrastructure as code; build automation for provisioning configuration scaling and upgrades of observability components.
  • Mentor and multiply Partner with engineers across Klaviyo to level up skills in debugging distributed systems designing effective alerts and using observability tools to make better product and reliability decisions.
  • Utilize AI Youve already experimented with AI in work or personal projects and youre excited to dive in and learn fast. Youre hungry to responsibly explore new AI tools and workflows finding ways to make your work smarter and more efficient.

What were looking for

  • Strong software engineering experience in at least one modern language (e.g. Go Python Java) and comfort working in Linux-based production environments.
  • Hands-on experience designing and operating observability systems at scale (for example: Prometheus / Cortex / Thanos / Mimir OpenTelemetry Grafana alerting pipelines log aggregation systems or distributed tracing backends).
  • A track record of improving reliability and performance of complex distributed applications using telemetry and data-driven insights.
  • Experience with infrastructure-as-code and modern cloud-native tooling (e.g. Terraform Kubernetes service meshes CI/CD systems).
  • Strong technical communication and collaboration skillsyoure comfortable partnering with many teams writing clear documentation and leading technical discussions.
  • A mindset that values simple wellunderstood systems iterative improvement and a bias toward empowering other engineers rather than being on the critical path for every change.

Technologies we use (not exhaustive):

  • Backend: Python Django Go
  • Observability Platform: Chronosphere Cortex Prometheus OTEL
  • Testing Frameworks: Pytest
  • Infrastructure and CI: AWS Kubernetes Terraform Helm Buildkite
  • Data: MySQL Redis Kafka

Klaviyo is growing fast and we have opportunities for engineers who care deeply about reliability developer experience and building strong foundational platforms. Learn more about our engineering culture at use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3 2025.

Please see the independent bias audit report covering our use of Covey here


Required Experience:

Senior IC

At Klaviyo Platform Engineering is what you get when you treat operating complex systems as a software engineering problem. Our Observability Platform group applies that philosophy to how we collect store and surface signals about the health of our products and infrastructure. We build and run the s...
View more view more

Key Skills

  • APIs
  • C/C++
  • Computer Graphics
  • Go
  • React
  • Redux
  • Node.js
  • AWS
  • Library Services
  • Assembly
  • GraphQL
  • High Voltage

About Company

Company Logo

Klaviyo unifies AI-powered email marketing and SMS to drive growth, retention, and measurable results. Build personalized, omnichannel experiences across WhatsApp, ecommerce, and more with K:AI Agents.

View Profile View Profile