DescriptionAs a Lead Site Reliability Engineer in the SRE Product Engineering team Chase UK you will be at the forefront of solving the most complex reliability challenges facing the banks essential services. You will work in all aspects of SRE including observability alerting capacity and chaos engineering with the goal of ensuring maximum reliability for our customers.
In the SRE Product Engineering team you will be the face of the SRE Chapter:
- Internally working closely with the Core SRE team who own and maintain the SRE projects and services you will build integrations and implement new features.
- Externally collaborating with all teams and disciplines including software developers cloud engineers and product managers you will integrate SRE tools and create bespoke solutions for individual products including payments and cards.
You are a highly technical adaptable and creative problem solver capable of finding optimal solutions to reliability problems and contributing to the SRE roadmap. You are a people person and enjoy interacting with stakeholders and building relationships. You are comfortable with an enhanced level of visibility ownership and responsibility.
Objectives of this Role
- Seek continuous improvement of reliability monitoring and alerting for our missioncritical microservices.
- Design monitoring and alerting that is customer journeybased and directly proportionate to customer experience supporting our you build it you own it model. Our alerts must be highly precise as developer teams are engaged immediately with no triage.
- Think outside of the box to eliminate toil and enable controls excellence automating as much as possible.
- Contribute to internal tools including our stateoftheart framework for SLI and error budget aggregation.
- Enhance performance testing forecasting and capacity planning framework.
- Contribute to chaos engineering framework.
Preferred Skills and Qualifications
- Degree in computer science or another highly technical scientific discipline.
- Proven experience as a software engineer including proficiency in at least one systems programming language (e.g. Python Go Java).
- Working knowledge of microservice infrastructure components.
- Excellent debugging and troubleshooting skills.
- Experience with Kubernetes.
- Experience in cloud computing (preferably AWS).
- Experience with common SRE toolchains such as Grafana Prometheus Elasticsearch Kibana and Jaeger is a plus.
#ICBCareer #ICBEngineering