Site Reliability Engineer Infrastructure

Prague - Czech Republic

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Why do we love building Make (and why you might too)

Were working on building one of the fastest-growing automation platforms and we are looking to hire a passionate professional with a strong interest to shape a career with a focus on Cloud services. Our team is strictly technical so you wont be bothered by business areas. We have a passion for automation whenever possible and focus on cloud technologies and the Kubernetes platform to enable Product teams to release new versions according to their needs. Sharing knowledge and feedback inside the team is at the top of our priorities and we welcome others with the same mindset. The aim is to ensure the maximum stability of our infrastructure and thus contribute to the satisfaction of our customers.

What youll do

Design and implement the architectural blueprints that allow our global automation platform to scale while maintaining high availability.
Define the SLIs SLOs and error budgets that guide our engineering teams balance between rapid feature velocity and system stability.
Build and maintain observability pipelines using metrics logs and traces to provide engineers with immediate actionable clarity on service behavior in production.
Participate in the resolution of production incidents and follow the blameless postmortem process to transform system failures into permanent technical improvements.
Cultivate an engineering environment focused on continuous learning from outages to proactively harden our platform against future regressions.
Develop and automate our CI/CD pipelines to ensure code changes are validated and deployed safely using strategies such as canary or blue/green releases.
Introduce and scale chaos engineering experiments to identify and fix infrastructure weak points before they can impact our customers.
Collaborate with developers during early design phases to ensure all new services meet our strict standards for scalability security and reliability.
Mentor senior engineers across the organization and represent SRE principles in technical leadership forums to ensure long-term platform health.
Participate in an on-call rotation to respond to incidents and maintain the 24/7 availability of the platform.

Our Tech stack

Back-end: TypeScript PostgreSQL RabbitMQ Redis Elasticsearch
Front-end: Angular TypeScript Redux Web Components Canvas Nx
Infra: Amazon AWS Docker Kubernetes
CI/CD: GitHub CircleCI ArgoCD
Monitoring: DataDog
AI: Claude Code Cursor Gemini GitHub Copilot (and experimenting with more)

What we expect from you

You get AI. Youre already using it daily to move faster and build better whether youre designing a system writing code or shipping features.
6 years of experience in Software Engineering or SRE roles with a proven track record of technical leadership.
A thorough understanding of how to apply SLI and SLO principles to drive meaningful reliability outcomes.
A development-first mindset where you approach infrastructure challenges through the lens of a software engineer.
Significant experience in mentoring and leveling up other senior engineers within a high-growth environment.
Deep proficiency in managing and operating Linux/Unix-based infrastructure at scale.
Extensive practical knowledge of cloud providers with a strong preference for AWS.
Expert-level experience with container orchestration specifically running production workloads on Kubernetes.
Advanced skills in Infrastructure as Code (IaC) using tools like Terraform to maintain version-controlled environments.
Direct experience building and optimizing CI/CD pipelines and executing modern deployment strategies like canary or blue/green.
Excellent communication skills in English to collaborate effectively with our international teams.

What we offer

RSUs grant in a rapidly growing company raising its value every day
Annual bonus
Multinational team with 42 nationalities creating the future of automation
Learning & Development plan (online language professional courses conference tickets and other trainings) & 2 learning days per year
Notebook/Macbook and 34 curved monitor
25 days of vacation 4 sick days Company day off 31.12.
10 care days to care for your loved ones
Extra parental vacation (3-6 months)
RSUs grant for a newborn child
Life insurance
Benefit Plus Cafeteria (incl. MultiSport Card)
Remote working allowance
Snack bar coffee tea fruit and vegetable and sweets all day - every day - available for everyone
Wednesday lunch and Friday break with company-provided food and drinks with music and lively discussion
Flexible working hours home office
Company therapy pets in Pragues office (dog-friendly office)
Company 3D printer
Team buildings parties and company events multiple times a year

#careeratmake

Required Experience:

Why do we love building Make (and why you might too)Were working on building one of the fastest-growing automation platforms and we are looking to hire a passionate professional with a strong interest to shape a career with a focus on Cloud services. Our team is strictly technical so you wont be bot...

Why do we love building Make (and why you might too)

What youll do

Design and implement the architectural blueprints that allow our global automation platform to scale while maintaining high availability.
Define the SLIs SLOs and error budgets that guide our engineering teams balance between rapid feature velocity and system stability.
Build and maintain observability pipelines using metrics logs and traces to provide engineers with immediate actionable clarity on service behavior in production.
Participate in the resolution of production incidents and follow the blameless postmortem process to transform system failures into permanent technical improvements.
Cultivate an engineering environment focused on continuous learning from outages to proactively harden our platform against future regressions.
Develop and automate our CI/CD pipelines to ensure code changes are validated and deployed safely using strategies such as canary or blue/green releases.
Introduce and scale chaos engineering experiments to identify and fix infrastructure weak points before they can impact our customers.
Collaborate with developers during early design phases to ensure all new services meet our strict standards for scalability security and reliability.
Mentor senior engineers across the organization and represent SRE principles in technical leadership forums to ensure long-term platform health.
Participate in an on-call rotation to respond to incidents and maintain the 24/7 availability of the platform.

Our Tech stack

Back-end: TypeScript PostgreSQL RabbitMQ Redis Elasticsearch
Front-end: Angular TypeScript Redux Web Components Canvas Nx
Infra: Amazon AWS Docker Kubernetes
CI/CD: GitHub CircleCI ArgoCD
Monitoring: DataDog
AI: Claude Code Cursor Gemini GitHub Copilot (and experimenting with more)

What we expect from you

You get AI. Youre already using it daily to move faster and build better whether youre designing a system writing code or shipping features.
6 years of experience in Software Engineering or SRE roles with a proven track record of technical leadership.
A thorough understanding of how to apply SLI and SLO principles to drive meaningful reliability outcomes.
A development-first mindset where you approach infrastructure challenges through the lens of a software engineer.
Significant experience in mentoring and leveling up other senior engineers within a high-growth environment.
Deep proficiency in managing and operating Linux/Unix-based infrastructure at scale.
Extensive practical knowledge of cloud providers with a strong preference for AWS.
Expert-level experience with container orchestration specifically running production workloads on Kubernetes.
Advanced skills in Infrastructure as Code (IaC) using tools like Terraform to maintain version-controlled environments.
Direct experience building and optimizing CI/CD pipelines and executing modern deployment strategies like canary or blue/green.
Excellent communication skills in English to collaborate effectively with our international teams.

What we offer

RSUs grant in a rapidly growing company raising its value every day
Annual bonus
Multinational team with 42 nationalities creating the future of automation
Learning & Development plan (online language professional courses conference tickets and other trainings) & 2 learning days per year
Notebook/Macbook and 34 curved monitor
25 days of vacation 4 sick days Company day off 31.12.
10 care days to care for your loved ones
Extra parental vacation (3-6 months)
RSUs grant for a newborn child
Life insurance
Benefit Plus Cafeteria (incl. MultiSport Card)
Remote working allowance
Snack bar coffee tea fruit and vegetable and sweets all day - every day - available for everyone
Wednesday lunch and Friday break with company-provided food and drinks with music and lively discussion
Flexible working hours home office
Company therapy pets in Pragues office (dog-friendly office)
Company 3D printer
Team buildings parties and company events multiple times a year