Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailAre you driven by a deep curiosity about how complex distributed systems work and more importantly how they fail Do you believe reliability is the most critical feature of any service
At Genomics England were pushing the boundaries of science and technology to transform patient outcomes and our platform underpins it all.
Were looking for a Site Reliability Engineer to ensure our platform is not just running but is sustainably reliable scalable and resilient. As a SRE advocate you will actively collaborate with engineering squads to cultivate a culture of reliability. You will play a pivotal role in driving our technical evolution influencing and shaping platform practices across the organisation.
Your responsibilities will include automating and optimising infrastructure to improve workload throughput. You will focus on implementing proactive measures to anticipate and address potential issues before they impact our users. You cant fix what you dont measure so there will be a focus on developing monitoring and metrics that teams will rely on day to day. Through this approach you will help create a platform that is not only scalable and resilient but also ready to meet the demands of our mission.
What Youll Be Doing Day-to-Day:
Your work will be a balance of proactive engineering and thoughtful operational practice. Youll move between different modes from deep project work and strategic initiatives to collaboration and incident response. Your primary mission will be to:
Champion Reliability: Work with engineering teams to define and measure what matters to our users establishing and monitoring SLIs SLOs and error budgets that drive data-informed decisions.
Learn from Failure: Be involved in blameless post-incident reviews that focus on identifying contributing factors ensuring we turn every failure into a valuable opportunity for systemic improvement.
Eliminate Toil: Systematically identify and automate repetitive manual and tactical operational processes. Youll reduce operational load by building solutions with enduring value.
Build Resilient Systems: Design build and maintain robust infrastructure across AWS and on-prem environments using Infrastructure as Code and automation. Youll also drive performance tuning capacity planning and cost optimisation.
Enable Developer Velocity: Develop CI/CD pipelines release automation and platform tooling that help our engineering squads deploy changes safely and efficiently without sacrificing reliability.
What Youll Bring:
Were looking for someone who not only advocates for the SRE mindset but can also implement it with robust code thoughtful automation and scalable architecture.
Mindset & Approach:
Deep-Seated Curiosity: Youre driven to understand how systems truly behave in production not just how they are supposed to work.
A Systems Thinker: You can zoom out to see the big picture and zoom in to troubleshoot the details understanding that reliability is an emergent property of the entire system.
Relentlessly Collaborative: You see reliability as a shared responsibility actively seeking out different perspectives and treating SRE as a dialogue. Youre open to new ideas welcome diverse viewpoints and thrive on teaching learning and driving initiatives with colleagues across various teams.
Incident Responder: You remain calm under pressure applying a structured approach to troubleshooting when the pager rings. You know how to take charge of an incident coordinate a response and mitigate issues efficiently.
Views Failure as an Opportunity: You champion blameless post-incident reviews as a core learning mechanism focusing on process and technology not people.
Customer-Focused: You understand that reliability must be measured from the customers perspective to be meaningful.
Technical Experience:
Experience applying Site Reliability Engineering principles in a production environment.
Strong hands-on experience with AWS services across compute storage networking and security.
Deep understanding of distributed systems and their common failure modes including issues related to latency data consistency and fault tolerance.
Experience with capacity planning performance engineering and designing systems that scale to meet traffic demands and remain fault-tolerant under pressure.
Excellent Infrastructure as Code skills (Terraform essential).
Solid scripting and software engineering fundamentals in languages like Python or Bash with an ability to debug code handle errors and understand system architecture.
Experience with observability and alerting tools (e.g. DataDog Cloudwatch OpsGenie etc) and a passion for turning data into actionable insights.
Knowledge of CI/CD tools (e.g. GitLab CI Jenkins) and release engineering best practices.
Familiarity with container orchestration (ECS Kubernetes) and running production-grade infrastructure at scale.
A good understanding of networking fundamentals (DNS TCP/IP HTTP) and their practical application including load balancing and traffic management.
Familiarity with Relational (e.g. PostgreSQL) and NoSQL Databases.
Nice to Haves:
Exposure to new tech evaluation lean experimentation or platform tooling decisions.
Experience mentoring or sharing knowledge across teams.
Understanding of genomics HPC data-heavy workloads or regulated environments.
Qualifications :
Formal qualifications are not mandatory. We value practical experience a curious mind and a passion for reliability. Relevant certifications in AWS Terraform or other technologies are welcome and highly beneficial.
Additional Information :
Closing Date: Monday 20th October at 23:00 (UK time)
Salary From: 71300
Being an integral part of such a meaningful mission is extremely rewarding in itself but in order to support our people were continually improving our benefits package. We pride ourselves on investing in our people and supporting them to achieve their career goals as well as offering a benefits package including:
Equal opportunities and our commitment to a diverse and inclusive workplace
Genomics England is actively committed to providing and supporting an inclusive environment that promotes equity diversity and inclusion best practice both within our community and in any other area where we have influence. We are proud of our diverse community where everyone can come to work and feel welcomed and treated with respect regardless of any disability ethnicity gender gender identity religion sexual orientation or social background.
Genomics Englands policies of non-discrimination and equity and will be applied fairly to all people regardless of age disability gender identity or reassignment marital or civil partnership status being pregnant or recently becoming a parent race religion or beliefs sex or sexual orientation length of service whether full or part-time or employed under a permanent or a fixed-term contract or any other relevant factor.
Genomics England does not tolerate any form of discrimination harassment victimisation or bullying at work. Such behaviour is contrary toour virtues undermines our mission and core values and diminishes the dignity respect and integrity of all parties. Our People policies outline our commitment to inclusivity.
We aim to remove barriers in our recruitment processes and to be flexible with our interview processes. Should you require any adjustments that may help you to fully participate in the recruitment process we encourage you to discuss this with us.
Blended working model
Genomics England operates a blended working model as we know our people appreciate the flexibility that hybrid working can bring. We expect most people to come into the office a minimum of 2 times each month. However this will vary according to role and will be agreed with your team leader. There is no expectation that people will return to the office full time unless they want to however some of our roles require full time on site attendance e.g. lab teams reception team.
Our teams and squads have and will continue to reflect on what works best for them to work together successfully and have the freedom to design working patterns to suit beyond the minimum. Our office locations are: Canary Wharf Cambridge and Leeds.
Onboarding background checks
As part of our recruitment process all successful candidates are subject to a Standard Disclosure and Barring Service (DBS) check. We therefore require applicants to disclose any previous offences at point of application as some unspent convictions may mean we are unable to proceed with your application due to the nature of our work in healthcare.
Remote Work :
No
Employment Type :
Full-time
Full-time