Site Reliability Engineer II


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

Company Description

Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore we work closely with our clients to build solutions that fit their needs - from software development AI and infrastructure engineering to industrial automation and embedded systems.

We combine strong technical expertise with a practical business-focused approach to help organizations modernize improve security and scale with confidence. Above all we focus on long-term partnerships built on trust quality and real results.

With us you have great opportunities to take real steps in your career and the opportunity to take great responsibility.

About the Role

Company: Aqilea India

Role : Site Reliability Engineer(SRE)

Exp : 5 to 10 years

Location : Bangalore(Hybrid)

Job Summary

We are seeking an experienced Site Reliability Engineer (SRE) to join our cross-functional product team and drive operational excellence reliability and performance across our eCommerce platforms. The ideal candidate will possess strong expertise in SRE principles DevOps practices cloud technologies and production support within a microservices-based architecture. This role focuses on ensuring application stability proactive monitoring incident management automation and continuous improvement of platform reliability.

Key Responsibilities

  • Work within cross-functional product teams as the reliability expert for assigned products or product areas.
  • Apply Site Reliability Engineering practices and standards in collaboration with SRE governance teams.
  • Ensure high-quality service delivery and provide operational KPI reporting.
  • Collaborate closely with product teams to maintain predictable operations and minimize production disruptions.
  • Drive continuous improvement initiatives by sharing best practices and enhancing operational processes.
  • Monitor manage troubleshoot and resolve application and infrastructure issues across production environments.
  • Perform technical analysis and root cause investigations for complex production incidents.
  • Improve system reliability through proactive monitoring alerting and preventive measures.
  • Analyze application code and logs to identify opportunities for product and operational improvements.
  • Develop automation solutions for monitoring housekeeping activities and incident prevention.
  • Ensure application and environment stability availability and performance.
  • Automate development and operational processes using scripting and infrastructure automation tools.
  • Participate in on-call support rotations and resolve business-critical incidents within SLA targets.
  • Define and track reliability metrics including SLIs SLOs and Error Budgets.
  • Contribute to performance engineering and application reliability initiatives.

Required Skills & Qualifications

Technical Expertise

  • Minimum 5 years of experience in Site Reliability Engineering Production Support Operations DevOps or Software Development.
  • Strong experience supporting and operating eCommerce platforms.
  • Hands-on experience with DevOps practices including CI/CD automated testing and release automation.
  • Experience troubleshooting complex distributed systems and microservices-based architectures.
  • Strong understanding of solution architecture and root cause analysis techniques.
  • Experience working with API-driven frameworks such as commerce tools Fabric or similar platforms.
  • Experience with ITIL processes and ITSM tools such as ServiceNow.
  • Knowledge of application reliability and performance engineering principles.
  • Experience supporting web desktop and mobile applications.

Cloud & Infrastructure

  • Hands-on experience with cloud platforms such as Microsoft Azure and/or Google Cloud Platform (GCP).
  • Experience with managed Kubernetes services such as AKS and/or GKE.
  • Experience provisioning and managing infrastructure using Terraform and/or Ansible.
  • Knowledge of cloud-native architecture scalability and reliability best practices.

Development & Automation

  • Proficiency in at least one programming language:
    • Python
    • Java
    • C#
    • Go
    • Ruby
  • Experience with GitHub Actions for CI/CD workflow development.
  • Familiarity with Azure DevOps and other deployment automation platforms.
  • Understanding of front-end technologies such as ReactJS React Native and is advantageous.

Monitoring & Reliability

  • Hands-on experience with observability and monitoring tools such as:
    • Splunk
    • Grafana
    • Similar monitoring platforms
  • Strong understanding of:
    • Service Level Indicators (SLIs)
    • Service Level Objectives (SLOs)
    • Error Budgets
    • Incident Management
    • Reliability Engineering Practices

Start: Immediate to 15 Days

Location: Bangalore (Hybrid)

Company Description Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore we work closely with our clients to build solutions that fit their needs - from software development AI and infrastructur...