Senior Site Reliability Engineer

Dublin - Ireland

Monthly Salary: Not Disclosed

Posted on: 5 hours ago

Vacancies: 1 Vacancy

Job Summary

Role Overview

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS cloud infrastructure containerised platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enabling engineering teams to deliver high-quality services efficiently.

This role combines engineering and operational excellence with a focus on automation observability scalability and resilience across cloud-native environments. As a senior engineer you will drive engineering-led solutions to reduce operational toil enhance system reliability and promote DevOps and SRE best practices.

Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernisation initiatives.

Key Responsibilities

Design implement and manage highly available and scalable infrastructure on AWS.
Build maintain and optimise DevOps Pipelines (CI/CD) for automated build test and deployment processes.
Implement end-to-end CI/CD workflows including multi-stage pipelines approvals and release strategies.
Manage and support Windows () and Linux-based production systems.
Deploy manage and optimise containerised applications using Docker and Kubernetes (EKS/AKS).
Implement Infrastructure as Code (IaC) using Terraform CloudFormation or ARM
Develop and maintain automation scripts using PowerShell Bash or Python.
Define and monitor SLIs SLOs and SLAs to ensure system reliability.
Implement robust monitoring logging and alerting solutions (CloudWatch Prometheus Grafana Azure Monitor).
Lead incident management troubleshooting and root cause analysis (RCA) for production issues.
Drive performance tuning and capacity planning for applications and infrastructure.
Collaborate with development teams to improve deployment strategies (blue-green canary releases).
Ensure security compliance and best practices across CI/CD pipelines and infrastructure.

Qualifications :

Required Skills & Experience

8 years of experience in Site Reliability Engineering / DevOps / Infrastructure Engineering
Strong hands-on experience with AWS services (EC2 S3 RDS VPC IAM ELB Auto Scaling CloudWatch)
Deep expertise in Azure DevOps Pipelines (CI/CD) including YAML pipelines and release automation
Experience designing multi-stage pipelines and deployment strategies
Expertise in Windows Server administration including IIS application support
Strong experience with Linux system administration
Hands-on experience with Docker and Kubernetes (EKS/AKS)
Experience with Infrastructure as Code (Terraform CloudFormation or ARM templates)
Strong scripting skills in PowerShell (mandatory) and Bash/Python
Experience with monitoring and logging tools (Prometheus Grafana ELK CloudWatch)
Solid understanding of networking security and cloud architecture principles

Preferred Qualifications

Experience with hybrid cloud or multi-cloud environments
Knowledge of Active Directory Group Policy and enterprise Windows environments
Familiarity with Helm GitOps practices or service mesh technologies
Experience with performance testing and tuning
Relevant certifications (AWS Kubernetes Azure DevOps)

Key Competencies / Characteristics

Reliability-driven: Focused on uptime performance and system resilience
Automation-first mindset: Continuously reduces manual effort and operational toil
Ownership mentality: Takes end-to-end responsibility from design through production
Strong communicator: Clearly articulates incidents RCA outcomes and technical concepts
Collaborative: Works effectively with platform security and application teams
Mentorship mindset: Actively supports and develops junior team members
Continuous learner: Keeps up with evolving SRE practices and cloud-native technologies

Additional Information :

D&I statement

Remote Work :

Employment Type :

Full-time

Role OverviewWe are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS cloud infrastructure containerised platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enablin...

Role Overview

Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernisation initiatives.

Key Responsibilities

Design implement and manage highly available and scalable infrastructure on AWS.
Build maintain and optimise DevOps Pipelines (CI/CD) for automated build test and deployment processes.
Implement end-to-end CI/CD workflows including multi-stage pipelines approvals and release strategies.
Manage and support Windows () and Linux-based production systems.
Deploy manage and optimise containerised applications using Docker and Kubernetes (EKS/AKS).
Implement Infrastructure as Code (IaC) using Terraform CloudFormation or ARM
Develop and maintain automation scripts using PowerShell Bash or Python.
Define and monitor SLIs SLOs and SLAs to ensure system reliability.
Implement robust monitoring logging and alerting solutions (CloudWatch Prometheus Grafana Azure Monitor).
Lead incident management troubleshooting and root cause analysis (RCA) for production issues.
Drive performance tuning and capacity planning for applications and infrastructure.
Collaborate with development teams to improve deployment strategies (blue-green canary releases).
Ensure security compliance and best practices across CI/CD pipelines and infrastructure.

Qualifications :

Required Skills & Experience

8 years of experience in Site Reliability Engineering / DevOps / Infrastructure Engineering
Strong hands-on experience with AWS services (EC2 S3 RDS VPC IAM ELB Auto Scaling CloudWatch)
Deep expertise in Azure DevOps Pipelines (CI/CD) including YAML pipelines and release automation
Experience designing multi-stage pipelines and deployment strategies
Expertise in Windows Server administration including IIS application support
Strong experience with Linux system administration
Hands-on experience with Docker and Kubernetes (EKS/AKS)
Experience with Infrastructure as Code (Terraform CloudFormation or ARM templates)
Strong scripting skills in PowerShell (mandatory) and Bash/Python
Experience with monitoring and logging tools (Prometheus Grafana ELK CloudWatch)
Solid understanding of networking security and cloud architecture principles

Preferred Qualifications

Experience with hybrid cloud or multi-cloud environments
Knowledge of Active Directory Group Policy and enterprise Windows environments
Familiarity with Helm GitOps practices or service mesh technologies
Experience with performance testing and tuning
Relevant certifications (AWS Kubernetes Azure DevOps)

Key Competencies / Characteristics

Reliability-driven: Focused on uptime performance and system resilience
Automation-first mindset: Continuously reduces manual effort and operational toil
Ownership mentality: Takes end-to-end responsibility from design through production
Strong communicator: Clearly articulates incidents RCA outcomes and technical concepts
Collaborative: Works effectively with platform security and application teams
Mentorship mindset: Actively supports and develops junior team members
Continuous learner: Keeps up with evolving SRE practices and cloud-native technologies

Additional Information :

D&I statement

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Mars Capital

Due to continued growth of our servicing platform we are looking for a Team Leader to support the business as it goes through this current period of growth. The successful candidates will act as team leader for a team of Customer Service Executives and Asset Managers working within th ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click