Senior Site Reliability Engineer (SRE)

Experian

Job Location:

Nottingham - UK

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

We are looking for a Site Reliability Engineer (SRE) to improve the reliability and performance of business-critical systems. You will focus on AWS cloud infrastructure DevOps tooling and core SRE practices within a distributed production environment. Reporting to our Lead you will work with development platform and operations teams to ensure systems are stable scalable well-monitored and meet defined reliability targets.

Main Responsibilities

Reliability and Operations:

Support high availability scalability and performance of production systems
Work with defined SLIs SLOs and SLAs ensuring services meet agreed reliability targets
Identify and reduce operational toil through automation and process improvement
Contribute to the design and implementation of fault-tolerant and resilient systems
Participate in resilience and failure testing activities to validate system behaviour under fault conditions and improve recovery

AWS & Cloud Operations:

Manage and operate systems hosted on AWS (EC2 EKS/ECS RDS S3 Lambda CloudWatch IAM and VPC)
Support cloud deployments and infrastructure changes following best practices
Help with backup disaster recovery and resiliency planning

DevOps & Automation:

Work with CI/CD pipelines and DevOps practices to ensure reliable and repeatable deployments including build test and release automation processes
Use Infrastructure as Code tools such as Terraform or CloudFormation to manage and provision infrastructure
Develop automation using scripting languages (Python Bash or similar) to reduce operational toil and improve efficiency

Incident Management:

Participate in production incident response troubleshooting and service restoration
Perform root cause analysis (RCA) and contribute to post-incident reviews
Help implement preventive actions to avoid incident recurrence

Observability:

Configure and maintain monitoring logging and alerting using tools like CloudWatch Prometheus Grafana Splunk or Dynatrace
Develop dashboards to track system and platform health and reliability metrics across the user journey
Improve alert quality to reduce noise and improve response times

Collaboration:

Work with application and engineering teams to embed reliability into system design
Collaborate within a globally distributed team using clear handovers to ensure continuity
Share knowledge and contribute to team-wide best practices
Communicate with all kinds of stakeholders influencing decisions through reliability-focused insights

Qualifications :

Experience in production support DevOps SRE cloud operations or systems engineeringCloud Expertise
Hands-on experience with AWS cloud services including compute container and serverless workloads
Practical experience with CI/CD pipelines and DevOps practices including Git-based version control pull request workflows code reviews and deployment automation
Experience with SRE principles monitoring and reliability engineering practices
Proficiency in scripting (Python Bash or similar) for automation and operational tooling
Experience with Linux systems and troubleshooting production issuesAdditional

Preferred Experience

Exposure to data platforms and data pipelines
Understanding of data reliability concepts
Experience supporting or operating complex distributed systems

Additional Information :

Benefits package includes:

Hybrid working
Great compensation and discretionary bonus
Core benefits include pension Bupa healthcare Sharesave scheme and more
25 days annual leave with 8 bank holidays and 3 volunteering days. You can purchase additional annual leave.

We take our people agenda very seriously and focus on what matters; DEI work/life balance development authenticity collaboration wellness reward & recognition volunteering... the list goes on. Experians people first approach is award-winning; Worlds Best Workplaces 2024 (Fortune Top 25) Great Place To Work in 24 countries and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why.

Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experians DNA and practices and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work irrespective of their gender ethnicity religion colour sexuality physical ability or age. If you have a disability or special need that requires accommodation please let us know at the earliest opportunity.

Experian Careers - Creating a better tomorrow together

Find out what its like to work for Experian by clicking here

#LI-Hybrid

This is a hybrid remote/in-office role.

Experian Careers - Creating a better tomorrow together

Find out what its like to work for Experian by clicking here

Remote Work :

Employment Type :

Full-time

Main Responsibilities

Reliability and Operations:

Support high availability scalability and performance of production systems
Work with defined SLIs SLOs and SLAs ensuring services meet agreed reliability targets
Identify and reduce operational toil through automation and process improvement
Contribute to the design and implementation of fault-tolerant and resilient systems
Participate in resilience and failure testing activities to validate system behaviour under fault conditions and improve recovery

AWS & Cloud Operations:

Manage and operate systems hosted on AWS (EC2 EKS/ECS RDS S3 Lambda CloudWatch IAM and VPC)
Support cloud deployments and infrastructure changes following best practices
Help with backup disaster recovery and resiliency planning

DevOps & Automation:

Work with CI/CD pipelines and DevOps practices to ensure reliable and repeatable deployments including build test and release automation processes
Use Infrastructure as Code tools such as Terraform or CloudFormation to manage and provision infrastructure
Develop automation using scripting languages (Python Bash or similar) to reduce operational toil and improve efficiency

Incident Management:

Participate in production incident response troubleshooting and service restoration
Perform root cause analysis (RCA) and contribute to post-incident reviews
Help implement preventive actions to avoid incident recurrence

Observability:

Configure and maintain monitoring logging and alerting using tools like CloudWatch Prometheus Grafana Splunk or Dynatrace
Develop dashboards to track system and platform health and reliability metrics across the user journey
Improve alert quality to reduce noise and improve response times

Collaboration:

Work with application and engineering teams to embed reliability into system design
Collaborate within a globally distributed team using clear handovers to ensure continuity
Share knowledge and contribute to team-wide best practices
Communicate with all kinds of stakeholders influencing decisions through reliability-focused insights

Qualifications :

Experience in production support DevOps SRE cloud operations or systems engineeringCloud Expertise
Hands-on experience with AWS cloud services including compute container and serverless workloads
Practical experience with CI/CD pipelines and DevOps practices including Git-based version control pull request workflows code reviews and deployment automation
Experience with SRE principles monitoring and reliability engineering practices
Proficiency in scripting (Python Bash or similar) for automation and operational tooling
Experience with Linux systems and troubleshooting production issuesAdditional

Preferred Experience

Exposure to data platforms and data pipelines
Understanding of data reliability concepts
Experience supporting or operating complex distributed systems

Additional Information :

Benefits package includes:

Hybrid working
Great compensation and discretionary bonus
Core benefits include pension Bupa healthcare Sharesave scheme and more
25 days annual leave with 8 bank holidays and 3 volunteering days. You can purchase additional annual leave.

Experian Careers - Creating a better tomorrow together

Find out what its like to work for Experian by clicking here

#LI-Hybrid

This is a hybrid remote/in-office role.

Experian Careers - Creating a better tomorrow together

Find out what its like to work for Experian by clicking here

Remote Work :

Employment Type :

Full-time

Apply Now

About Company

Experian

Experian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click