drjobs Application Site Reliability Engineer

Application Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

London - UK

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Were Capital on Tap
Capital on Tap was founded with the mission to help small business owners and make their lives easier. Today we provide an all-in-one business credit card & spend management platform that helps business owners save time and money. Capital on Tap proudly serves over 200000 businesses across the world and our goal is to help 1 million small businesses by 2030.

Why Join Us
We empower you to be innovative and solve complex problems. Take ownership make an impact and thrive in our scaling and agile environment.

This is a Hybrid role the SRE team work from our London (Shoreditch) Offices 1-2 days per week.

SRE at Capital on Tap

At Capital On Tap we run a hybrid embedded SRE model. We aim to work closely with the teams within Capital On Tap to provide them the best support. Our main objective currently is to gain as much visibility to our platforms health while offering scalable solutions.

What Youll Be Doing
As a Site Reliability Engineer (SRE) you will ensure the reliability performance and availability of our applications.

Your role includes designing building and monitoring systems to maximise uptime and efficiency while collaborating with the Platform teams to build reliable scalable applications. You will also proactively address potential outages and performance issues by implementing structured monitoring and alerting.

Finally you will determine the launch of new features by using service-level agreements (SLAs) to define the required reliability of the platforms through service-level indicators (SLI) and service-level objectives (SLO) whilst working closely with the product team.

  • Design and implement highly available and scalable systems ensuring the reliability and performance of the companys website or application
  • Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems
  • Monitor systems and applications proactively identifying and resolving any performance bottlenecks or availability issues
  • Develop and maintain monitoring tools alerts and dashboards to provide visibility into system health and performance
  • Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents
  • Automate repetitive tasks and processes to improve efficiency and reduce manual intervention
  • Create and maintain documentation for system architecture configuration and troubleshooting procedures
  • Perform capacity planning and resource allocation to ensure optimal system performance and scalability
  • Collaborate with development teams to implement and deploy new features and enhancements ensuring they meet reliability and performance standards
  • Stay up to date with industry best practices new technologies and emerging trends in site reliability engineering

Were Looking For
Required skills:

  • Experience in managing a public cloud (Azure advantageous)
  • Experience in Azure DevOps Octopus Flux Github or other CI/CD tools
  • Experience in Python Powershell C# or other scripting languages
  • Experience with Linux and Microsoft Systems
  • Excellent communication skills and ability to collaborate with multiple teams in an agile environment
  • Strong problem-solving and troubleshooting skills with the ability to analyze and resolve complex technical issues
  • Expertise in monitoring and logging tools (Datadog advantageous)
  • Experience with Kubernetes and Containerisation
  • Experience with setting & adjusting SLOs working with product teams

Nice to have skills:

  • Experience with IaC tools such as Terraform
  • Knowledge of service mesh technologies such as Istio
  • Experience with SQL databases

Diversity & Inclusion
We welcome consider and encourage applications from anyone who shares our commitment to inclusivity. Join us in creating a space where authenticity thrives and everyone can do their best work.

Great Work Deserves Great Perks
We try not to take ourselves too seriously (all the time) so we make sure our office is decked out with a pool table arcade machine beer tap and a couple of office dogs thrown in for good measure. Check out our benefits:


Private Healthcare including dental and opticians services through Vitality
Worldwide travel insurance through Vitality
Anniversary Rewards (-week fully paid sabbatical)
Salary Sacrifice Pension Scheme up to 7% match
28 days holiday (plus bank holidays)
Annual Learning and Wellbeing Budget
Enhanced Parental Leave
Cycle to Work Scheme
Season Ticket Loan
6 free therapy sessions per year
Dog Friendly Offices
Free drinks and snacks in our offices

Check out more of our benefits values and mission here.

Interview Process
First stage: 30 minute intro and values call with Talent Partner (Video call)
Second stage: 45 minute CV overview with Head of department & Engineering Team Leads and/or PM (Video call)
Final stage: 60 minute questions and scenario-based interview with SRE Team Lead (Video call)

Employment Type

Full Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.