Senior Software Engineer, Reliability

Dublin - Ireland

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

At Klaviyo we value the unique backgrounds experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If youre a close but not exact match with the description we hope youll still consider applying. Want to learn more about life at Klaviyo Visit see how we empower creators to own their own destiny.

Senior Software Engineer Reliability(Dublin)

Team Overview:

As a Senior Software Engineer Reliability youll ensure Klaviyos critical platforms are reliable scalable and sustainable while enabling rapid product development. We treat reliability as a core product feature and use software engineering to solve complex systems and operational challenges.

Our work spans security infrastructure and software development requiring us to understand systems and engineering. We build complex foundational solutions that must be extremely reliable secure and performant at global scale.

Our charter is to build and operate foundational services and infrastructure define clear reliability objectives reduce operational toil through automation and continuously improve systems based on real production learnings. The work is highly visible and directly impacts how Klaviyos build software and how customers experience Klaviyo every day.

How Youll Make an Impact:

As a Senior Software Engineer Reliability you will build and operate the platforms systems and services that underpin Klaviyos reliability and operational excellence. You will:

Build and operate foundational security-critical services with a strong emphasis on availability scalability latency and fault tolerance
Apply software engineering principles to automate infrastructure reduce operational toil and improve system reliability at scale
Design implement and evolve systems using SRE best practices
Define and refine SLIs SLOs and error budgets to guide engineering decisions
Improve observability alerting and incident response to reduce mean time to detection and recovery
Participate in on-call rotations with a focus on sustainable operations and automatic remediations
Perform quantitative analysis to understand system behavior capacity constraints and scaling limits
Identify systemic risks and reliability bottlenecks and drive long-term preventative solutions
Collaborate closely with product platform and security engineers to influence architecture early and ship reliable systems
Mentor and pair with other engineers helping raise the bar for reliability operational maturity and engineering excellence

Who You Are:

You are a cloud-native platform-focused SRE who uses software to build and operate reliable production systems at scale.

You write and maintain production-quality code (e.g. Python Go or similar) to build internal platforms automate operations and improve system reliability
You have built deployed and operated distributed cloud-native systems and understand failure modes such as partial outages dependency failures resource saturation and cascading impact
You have experience operating containerized workloads and platforms (e.g. Kubernetes) in production including deployment strategies scaling behavior and service networking
You are comfortable participating in on-call rotations and diagnosing production issues
You have designed and operated observability systems and know how to build actionable alerts that reflect real user and service impact
You apply SRE concepts such as SLIs SLOs error budgets and burn-ratebased alerting to guide engineering decisions and operational response
You have hands-on experience with infrastructure as code and declarative configuration (e.g. Terraform Kubernetes manifests policy-as-code)
You have performed capacity planning load testing and performance analysis for distributed services and platforms
You routinely contribute to post-incident reviews and drive concrete code-focused follow-up actions that prevent recurrence
You are comfortable reviewing and contributing to technical designs platform APIs operational runbooks and system documentation
Youve already experimented with AI in work or personal projects and youre excited to dive in and learn fast. Youre hungry to responsibly explore new AI tools and workflows finding ways to make your work smarter and more efficient.

Nice to Have:

Experience supporting security-critical platforms or building internal security tooling
Familiarity with identity access management secrets management or policy enforcement systems
Experience operating systems at scale in cloud environments (AWS preferred)
Background in resilience testing fault injection or chaos engineering
A strong comprehension of algorithms and data structures at scale

Tech Stack:

Klaviyos platform is primarily built with Python and React and runs on AWS. Engineers join us from a wide range of technical backgrounds and are supported in learning our stack.

Core technologies include:

Python / Django / FastAPI
MySQL / Redis / Memcached
RabbitMQ / Celery / Apache Kafka / Apache Pulsar
AWS / Terraform / Kubernetes

Location & Work Model:

This role is based in Dublin Ireland and follows a hybrid working model. Klaviyo supports work authorization and relocation for this position.

At Klaviyo we enjoy tackling meaningful engineering challenges and value people who take ownership learn continuously and collaborate openly. We are committed to building inclusive teams and encourage applications from candidates of all backgrounds.

Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at

We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3 2025.

Please see the independent bias audit report covering our use of Covey here

Required Experience:

Senior IC

Senior Software Engineer Reliability(Dublin)

Team Overview:

How Youll Make an Impact:

As a Senior Software Engineer Reliability you will build and operate the platforms systems and services that underpin Klaviyos reliability and operational excellence. You will:

Build and operate foundational security-critical services with a strong emphasis on availability scalability latency and fault tolerance
Apply software engineering principles to automate infrastructure reduce operational toil and improve system reliability at scale
Design implement and evolve systems using SRE best practices
Define and refine SLIs SLOs and error budgets to guide engineering decisions
Improve observability alerting and incident response to reduce mean time to detection and recovery
Participate in on-call rotations with a focus on sustainable operations and automatic remediations
Perform quantitative analysis to understand system behavior capacity constraints and scaling limits
Identify systemic risks and reliability bottlenecks and drive long-term preventative solutions
Collaborate closely with product platform and security engineers to influence architecture early and ship reliable systems
Mentor and pair with other engineers helping raise the bar for reliability operational maturity and engineering excellence

Who You Are:

You are a cloud-native platform-focused SRE who uses software to build and operate reliable production systems at scale.

You write and maintain production-quality code (e.g. Python Go or similar) to build internal platforms automate operations and improve system reliability
You have built deployed and operated distributed cloud-native systems and understand failure modes such as partial outages dependency failures resource saturation and cascading impact
You have experience operating containerized workloads and platforms (e.g. Kubernetes) in production including deployment strategies scaling behavior and service networking
You are comfortable participating in on-call rotations and diagnosing production issues
You have designed and operated observability systems and know how to build actionable alerts that reflect real user and service impact
You apply SRE concepts such as SLIs SLOs error budgets and burn-ratebased alerting to guide engineering decisions and operational response
You have hands-on experience with infrastructure as code and declarative configuration (e.g. Terraform Kubernetes manifests policy-as-code)
You have performed capacity planning load testing and performance analysis for distributed services and platforms
You routinely contribute to post-incident reviews and drive concrete code-focused follow-up actions that prevent recurrence
You are comfortable reviewing and contributing to technical designs platform APIs operational runbooks and system documentation
Youve already experimented with AI in work or personal projects and youre excited to dive in and learn fast. Youre hungry to responsibly explore new AI tools and workflows finding ways to make your work smarter and more efficient.

Nice to Have:

Experience supporting security-critical platforms or building internal security tooling
Familiarity with identity access management secrets management or policy enforcement systems
Experience operating systems at scale in cloud environments (AWS preferred)
Background in resilience testing fault injection or chaos engineering
A strong comprehension of algorithms and data structures at scale

Tech Stack:

Klaviyos platform is primarily built with Python and React and runs on AWS. Engineers join us from a wide range of technical backgrounds and are supported in learning our stack.

Core technologies include:

Python / Django / FastAPI
MySQL / Redis / Memcached
RabbitMQ / Celery / Apache Kafka / Apache Pulsar
AWS / Terraform / Kubernetes

Location & Work Model:

This role is based in Dublin Ireland and follows a hybrid working model. Klaviyo supports work authorization and relocation for this position.

Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at

Please see the independent bias audit report covering our use of Covey here

Required Experience:

Senior IC

Key Skills

Apply Now

About Company

Klaviyo

Klaviyo unifies AI-powered email marketing and SMS to drive growth, retention, and measurable results. Build personalized, omnichannel experiences across WhatsApp, ecommerce, and more with K:AI Agents.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Software Engineer, Reliability

Dublin - Ireland

Job Summary

Senior Software Engineer Reliability(Dublin)

Team Overview:

How Youll Make an Impact:

Who You Are:

Nice to Have:

Tech Stack:

Location & Work Model:

Senior Software Engineer Reliability(Dublin)

Team Overview:

How Youll Make an Impact:

Who You Are:

Nice to Have:

Tech Stack:

Location & Work Model:

Key Skills

About Company

Related Jobs