drjobs Site Reliability Engineer 12 Month FTC (we have office locations in Cambridge, Leeds and London)

Site Reliability Engineer 12 Month FTC (we have office locations in Cambridge, Leeds and London)

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

London - UK

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Are you driven by a deep curiosity about how complex distributed systems work and more importantly how they fail Do you believe reliability is the most critical feature of any service  

At Genomics England were pushing the boundaries of science and technology to transform patient outcomes and our platform underpins it all.  

Were looking for a Site Reliability Engineer to ensure our platform is not just running but is sustainably reliable scalable and resilient. As a SRE advocate you will actively collaborate with engineering squads to cultivate a culture of reliability. You will play a pivotal role in driving our technical evolution influencing and shaping platform practices across the organisation. 

Your responsibilities will include automating and optimising infrastructure to improve workload throughput. You will focus on implementing proactive measures to anticipate and address potential issues before they impact our users. You cant fix what you dont measure so there will be a focus on developing monitoring and metrics that teams will rely on day to day.  Through this approach you will help create a platform that is not only scalable and resilient but also ready to meet the demands of our mission. 

What Youll Be Doing Day-to-Day: 

Your work will be a balance of proactive engineering and thoughtful operational practice. Youll move between different modes from deep project work and strategic initiatives to collaboration and incident response. Your primary mission will be to: 

  • Champion Reliability: Work with engineering teams to define and measure what matters to our users establishing and monitoring SLIs SLOs and error budgets that drive data-informed decisions. 

  • Learn from Failure: Be involved in blameless post-incident reviews that focus on identifying contributing factors ensuring we turn every failure into a valuable opportunity for systemic improvement. 

  • Eliminate Toil: Systematically identify and automate repetitive manual and tactical operational processes. Youll reduce operational load by building solutions with enduring value. 

  • Build Resilient Systems: Design build and maintain robust infrastructure across AWS and on-prem environments using Infrastructure as Code and automation. Youll also drive performance tuning capacity planning and cost optimisation. 

  • Enable Developer Velocity: Develop CI/CD pipelines release automation and platform tooling that help our engineering squads deploy changes safely and efficiently without sacrificing reliability. 

  • Share Your Knowledge: Create clear usable documentation and act as a consultant and advocate for SRE and DevOps best practices helping to improve resilience across the entire organisation. 


What Youll Bring:  

Were looking for someone who not only advocates for the SRE mindset but can also implement it with robust code thoughtful automation and scalable architecture. 

Mindset & Approach: 

  • Deep-Seated Curiosity: Youre driven to understand how systems truly behave in production not just how they are supposed to work. 

  • A Systems Thinker: You can zoom out to see the big picture and zoom in to troubleshoot the details understanding that reliability is an emergent property of the entire system. 

  • Relentlessly Collaborative: You see reliability as a shared responsibility actively seeking out different perspectives and treating SRE as a dialogue. Youre open to new ideas welcome diverse viewpoints and thrive on teaching learning and driving initiatives with colleagues across various teams. 

  • Incident Responder: You remain calm under pressure applying a structured approach to troubleshooting when the pager rings. You know how to take charge of an incident coordinate a response and mitigate issues efficiently. 

  • Views Failure as an Opportunity: You champion blameless post-incident reviews as a core learning mechanism focusing on process and technology not people. 

  • Customer-Focused: You understand that reliability must be measured from the customers perspective to be meaningful. 

Technical Experience: 

  • Experience applying Site Reliability Engineering principles in a production environment. 

  • Strong hands-on experience with AWS services across compute storage networking and security. 

  • Deep understanding of distributed systems and their common failure modes including issues related to latency data consistency and fault tolerance. 

  • Experience with capacity planning performance engineering and designing systems that scale to meet traffic demands and remain fault-tolerant under pressure. 

  • Excellent Infrastructure as Code skills (Terraform essential). 

  • Solid scripting and software engineering fundamentals in languages like Python or Bash with an ability to debug code handle errors and understand system architecture. 

  • Experience with observability and alerting tools (e.g. DataDog Cloudwatch OpsGenie etc) and a passion for turning data into actionable insights. 

  • Knowledge of CI/CD tools (e.g. GitLab CI Jenkins) and release engineering best practices. 

  • Familiarity with container orchestration (ECS Kubernetes) and running production-grade infrastructure at scale. 

  • A good understanding of networking fundamentals (DNS TCP/IP HTTP) and their practical application including load balancing and traffic management. 

  • Familiarity with Relational (e.g. PostgreSQL) and NoSQL Databases. 

Nice to Haves: 

  • Exposure to new tech evaluation lean experimentation or platform tooling decisions. 

  • Experience mentoring or sharing knowledge across teams. 

  • Understanding of genomics HPC data-heavy workloads or regulated environments. 


Qualifications :

Formal qualifications are not mandatory. We value practical experience a curious mind and a passion for reliability. Relevant certifications in AWS Terraform or other technologies are welcome and highly beneficial.
    


Additional Information :

Closing Date: Monday 20th October at 23:00 (UK time) 

Salary From: 71300

Being an integral part of such a meaningful mission is extremely rewarding in itself but in order to support our people were continually improving our benefits package. We pride ourselves on investing in our people and supporting them to achieve their career goals as well as offering a benefits package including: 

  • Generous Leave: 30 days holiday plus bank holidays additional leave for long service and the option to apply for up to 30 days of remote working abroad annually (approval required).
  • Family-Friendly: Blended working arrangements flexible working enhanced maternity paternity and shared parental leave benefits.
  • Pension & Financial: Defined contribution pension (Genomics England double-matches up to 10% however you can contribute more if you wish) Life Assurance (3x salary) and a Give As You Earn scheme.
  • Learning & Development: Individual learning budgets support for training and certifications and reimbursement for one annual professional subscription (approval required).
  • Recognition & Rewards: Employee recognition programme and referral scheme.
  • Health & Wellbeing: Subsidised gym membership a free Headspace account and access to an Employee Assistance Programme eye tests flu jabs.

Equal opportunities and our commitment to a diverse and inclusive workplace 

Genomics England is actively committed to providing and supporting an inclusive environment that promotes equity diversity and inclusion best practice both within our community and in any other area where we have influence. We are proud of our diverse community where everyone can come to work and feel welcomed and treated with respect regardless of any disability ethnicity gender gender identity religion sexual orientation or social background. 

Genomics Englands policies of non-discrimination and equity and will be applied fairly to all people regardless of age disability gender identity or reassignment marital or civil partnership status being pregnant or recently becoming a parent race religion or beliefs sex or sexual orientation length of service whether full or part-time or employed under a permanent or a fixed-term contract or any other relevant factor.  

Genomics England does not tolerate any form of discrimination harassment victimisation or bullying at work. Such behaviour is contrary toour virtues undermines our mission and core values and diminishes the dignity respect and integrity of all parties.  Our People policies outline our commitment to inclusivity. 

We aim to remove barriers in our recruitment processes and to be flexible with our interview processes. Should you require any adjustments that may help you to fully participate in the recruitment process we encourage you to discuss this with us. 

Blended working model

Genomics England operates a blended working model as we know our people appreciate the flexibility that hybrid working can bring. We expect most people to come into the office a minimum of 2 times each month. However this will vary according to role and will be agreed with your team leader. There is no expectation that people will return to the office full time unless they want to however some of our roles require full time on site attendance e.g. lab teams reception team. 

Our teams and squads have and will continue to reflect on what works best for them to work together successfully and have the freedom to design working patterns to suit beyond the minimum. Our office locations are: Canary Wharf Cambridge and Leeds.

Onboarding background checks

As part of our recruitment process all successful candidates are subject to a Standard Disclosure and Barring Service (DBS) check.  We therefore require applicants to disclose any previous offences at point of application as some unspent convictions may mean we are unable to proceed with your application due to the nature of our work in healthcare. 


Remote Work :

No


Employment Type :

Full-time

Employment Type

Full-time

Department / Functional Area

Engineering

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.