Site Reliability Engineer (SRE)

Leapwork

Posted on : 09-09-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Gurgaon - India

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 09-09-2025

Job Description

At Leapwork our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation solution enabling some of the worlds largest enterprises to adopt scale and maintain automation in under 30 days.

In todays environment whereefficiency automation and cost optimizationare essential to enterprise growth we are uniquely positioned to deliver impact.

In 2023 Microsoftthe worlds largest and most recognizable software companyrecognised Leapwork as a truly innovative and disruptive product leading to a strategic partnership that continues to be a major growth catalyst.

If youre contemplating the next step in your career and seek a fast-paced company where you can impact the build and growth of something truly special look no further!

We are headquartered in Copenhagen Denmark and have local offices across Europe the US and Asia.

We are looking for an experienced and forward-thinking Senior Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure this role you will ensure the reliability availability scalability and performance of our Azure-based platforms and applications.

You will partner with cross-functional teams to design implement and maintain resilient infrastructure while driving automation monitoring and optimization initiatives across our cloud environment.

Role Responsibilities:

Service Reliability & SLOs: Define and maintain Service Level Objectives (SLOs) for the systems you own. Continuously measure and improve availability latency and overall system health.
Automation & Scalability: Develop automation to scale systems sustainably prevent service issues and enable rapid recovery when incidents occur.
Collaboration & Architecture Influence: Partner with development teams to improve reliability observability and release velocity. Influence architectural decisions to embed high availability and operability into applications.
Incident Management: Participate in on-call rotations lead incident response conduct postmortems and drive root cause resolution with a focus on prevention.
Monitoring & Observability: Implement and refine monitoring alerting and observability solutions (Azure Monitor Datadog Grafana Prometheus Loki Tempo) to ensure proactive detection of issues.
Disaster Recovery & BCP: Design test and maintain disaster recovery and business continuity strategies to safeguard system availability and data integrity.
Cost Optimization: Monitor and optimize Azure resource usage for performance and cost efficiency.
Engineering Best Practices: Be a vocal advocate for strong engineering practices enabling scalable reliable and performant systems.
Cloud Migration Enablement: Support cloud migration initiatives in partnership with foundation and migration teams from architectural reviews to operational acceptance testing and configuring Grafana dashboards and Azure Log Analytics metrics.
AI & Intelligent Automation: Leverage AI/ML-driven tools to improve system observability incident prediction and automated remediation ensuring faster recovery and reduced downtime.
SRE Agents: Work with or build SRE Agents to automate routine operational tasks such as log analysis anomaly detection incident triage and performance tuning.
Data-Driven Reliability: Analyse monitoring data using AI/ML to identify hidden trends optimize system health and drive continuous improvement in reliability practices.
Documentation & Knowledge Sharing: Maintain detailed documentation of systems processes and architecture to ensure alignment and smooth onboarding of team members.
Continuous Learning: Actively participate in and foster a culture of continuous learning and development within the team.
Mentorship: Guide and mentor junior engineers promoting collaboration and technical growth

Technical Qualifications/ Role Requirements (Must - Have Skills)

Bachelors degree in computer science Engineering or a related technical field. Masters degree is a plus.
Proven experience (7 years) working as an SRE with a specific focus on Microsoft Azure Cloud services.
Deep understanding of Azure services including Azure Kubernetes Service (AKS) Azure App Service Azure Functions Azure Monitor and Azure Resource Manager.
Proficiency in scripting and programming languages (e.g. PowerShell Python) for automation infrastructure management and tool development.
Hands-on experience with containerization and orchestration technologies such as Docker and Kubernetes in an Azure context.
Strong incident management skills with a data-driven and analytical approach to diagnosing complex issues.
Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform ARM templates) and configuration management tools (e.g. Ansible Chef Puppet).
Familiarity with AI-powered monitoring anomaly detection and auto-remediation tools.
Experience working with SRE Agents or similar intelligent automation frameworks for operational efficiency.
Ability to integrate AI-driven insights into incident response root cause analysis and reliability engineering
Excellent problem-solving skills attention to detail and a proactive attitude towards addressing operational challenges.
Effective communication and collaboration skills with the ability to work across teams and influence technical decisions.
Experience with CI/CD pipelines and version control systems (e.g. Git).
Relevant Azure certifications (e.g. Microsoft Certified: Azure Solutions Architect Expert Microsoft Certified: Azure DevOps Engineer Expert) are highly advantageous.
In-depth knowledge of monitoring and alerting tools like Grafana Prometheus Loki and Tempo.
Analyze monitoring data to identify trends and root causes of incidents leading to continuous improvement of system health.
A strong understanding of DevOps principles and automation practices.

Why Leapwork

We are on an exciting journey of global growth and this is your chance to get onboard and an opportunity to lead and shape digital transformation initiatives in a forward-thinking company working with and learning from a talented and passionate team committed to innovation and excellence

By joining our team youll become part of a fast-paced international environment where you can grow challenge yourself and do what inspires you. We work hard but have fun while doing it and we believe that collaboration social activities and celebration are keys to success.

Our Leapwork principles:

Our five key principles capture the essence of what it means to be a part of our world-class team! They are integral to how we approach our work and one another and they serve as a roadmap to our continued growth development achievements and success.

Customer first;We listen to our customers understand their pain points and focus on what matters to them.

Lead from the front;Leading means guiding others towards the solutions to our challenges.

Get it done;We make commitments follow through and deliver work were proud of.

Build excellence;We do our best work every day holding ourselves and others to the highest standards.

Respectfully different; We treat each other with respect always. Were different not indifferent.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

Leapwork

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Site Reliability Engineer (SRE)

Leapwork

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Freelance Network Engineer

Process Engineer

Senior Network Engineer

Leading Commissioning Engineer

Senior Ping IAM & SRE Engineer Hybrid 2-3 days on site

CAD Technician (On-site)

PLC Engineer

Design Engineer