Lead Software Platform Engineer Platform Engineering

Klaviyo

Not Interested
Bookmark
Report This Job

profile Job Location:

Boston, NH - USA

profile Monthly Salary: Not Disclosed
Posted on: 3 days ago
Vacancies: 1 Vacancy

Job Summary

Lead Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver better software faster. The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. Lead SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems building self-healing applications and eking out every drop of performance possible. As a Lead Site Reliability Engineer you will own the ways we solve problems for our customers and make a big impact on the productivity of our product engineering teams. Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at

How Youll Make a Difference

  • Ship foundational services to enable Klaviyo engineering to move faster with confidence
  • Design and develop systems and processes that enable highly available & scalable systems
  • Uncover and advocate for preventative upstream solutions with internal stakeholders
  • Own the technical vision and roadmap for your area working with stakeholders to solve pain points and deliver value to engineering
  • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
  • Leverage technology such as Python AWS Django Kubernetes Bash Terraform MySQL Redis Postgresql to advance Klaviyos platform
  • Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
  • Contribute to the company in multiple areas constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
  • Design write and deliver software to dramatically improve the availability scalability latency and efficiency of Klaviyos services
  • Participate in periodic on call duties with a focus on solving issues when they are discovered preventing recurrences and minimizing alert fatigue
  • Implement architectural improvements to achieve breakthrough results in Klaviyo systems operational scalability and reliability.
  • Work hand-in-hand with product-facing engineers and other SREs to ship impactful code
  • Perform quantitative analysis to understand and scale Klaviyo systems
  • Evangelize Site Reliability best practices across the engineering organization

Who You Are

  • Solid 10 years of experience in the SRE/Devops field
  • BA or BS Degree in Computer Science related field or equivalent experience
  • Ability to handle yourself in outage situations and to drive failures to root cause analysis and prevention of future issues
  • Understanding of Linux (we run Ubuntu) and all layers of the networking stack
  • Experience working on an engineering team building software
  • Experience writing code using best practices in a language such as Python Ruby Go etc.
  • Youve already experimented with AI in work or personal projects and youre excited to dive in and learn fast. Youre hungry to responsibly explore new AI tools and workflows finding ways to make your work smarter and more efficient.

We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3 2025.

Please see the independent bias audit report covering our use of Covey here


Required Experience:

IC

Lead Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver...
View more view more

Key Skills

  • Spring
  • .NET
  • C/C++
  • Go
  • React
  • OOP
  • C#
  • Data Structures
  • JavaScript
  • Software Development
  • Java
  • Distributed Systems

About Company

Company Logo

Klaviyo unifies AI-powered email marketing and SMS to drive growth, retention, and measurable results. Build personalized, omnichannel experiences across WhatsApp, ecommerce, and more with K:AI Agents.

View Profile View Profile