Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore we work closely with our clients to build solutions that fit their needs - from software development AI and infrastructure engineering to industrial automation and embedded systems.
We combine strong technical expertise with a practical business-focused approach to help organizations modernize improve security and scale with confidence. Above all we focus on long-term partnerships built on trust quality and real results.
With us you have great opportunities to take real steps in your career and the opportunity to take great responsibility.
About the Role
Company: Aqilea India
Role : Site Reliability Engineer(SRE)
Exp : 5 to 10 years
Location : Bangalore(Hybrid)
Job Summary
We are seeking an experienced Site Reliability Engineer (SRE) to join our cross-functional product team and drive operational excellence reliability and performance across our eCommerce platforms. The ideal candidate will possess strong expertise in SRE principles DevOps practices cloud technologies and production support within a microservices-based architecture. This role focuses on ensuring application stability proactive monitoring incident management automation and continuous improvement of platform reliability.
Key Responsibilities
Work within cross-functional product teams as the reliability expert for assigned products or product areas.
Apply Site Reliability Engineering practices and standards in collaboration with SRE governance teams.
Ensure high-quality service delivery and provide operational KPI reporting.
Collaborate closely with product teams to maintain predictable operations and minimize production disruptions.
Drive continuous improvement initiatives by sharing best practices and enhancing operational processes.
Monitor manage troubleshoot and resolve application and infrastructure issues across production environments.
Perform technical analysis and root cause investigations for complex production incidents.
Improve system reliability through proactive monitoring alerting and preventive measures.
Analyze application code and logs to identify opportunities for product and operational improvements.
Develop automation solutions for monitoring housekeeping activities and incident prevention.
Ensure application and environment stability availability and performance.
Automate development and operational processes using scripting and infrastructure automation tools.
Participate in on-call support rotations and resolve business-critical incidents within SLA targets.
Define and track reliability metrics including SLIs SLOs and Error Budgets.
Contribute to performance engineering and application reliability initiatives.
Required Skills & Qualifications
Technical Expertise
Minimum 5 years of experience in Site Reliability Engineering Production Support Operations DevOps or Software Development.
Strong experience supporting and operating eCommerce platforms.
Hands-on experience with DevOps practices including CI/CD automated testing and release automation.
Experience troubleshooting complex distributed systems and microservices-based architectures.
Strong understanding of solution architecture and root cause analysis techniques.
Experience working with API-driven frameworks such as commerce toolsFabric or similar platforms.
Experience with ITIL processes and ITSM tools such as ServiceNow.
Knowledge of application reliability and performance engineering principles.
Experience supporting web desktop and mobile applications.
Cloud & Infrastructure
Hands-on experience with cloud platforms such as Microsoft Azure and/or Google Cloud Platform (GCP).
Experience with managed Kubernetes services such as AKS and/or GKE.
Experience provisioning and managing infrastructure using Terraform and/or Ansible.
Knowledge of cloud-native architecture scalability and reliability best practices.
Development & Automation
Proficiency in at least one programming language:
Python
Java
C#
Go
Ruby
Experience with GitHub Actions for CI/CD workflow development.
Familiarity with Azure DevOps and other deployment automation platforms.
Understanding of front-end technologies such as ReactJSReact Native and is advantageous.
Monitoring & Reliability
Hands-on experience with observability and monitoring tools such as:
Splunk
Grafana
Similar monitoring platforms
Strong understanding of:
Service Level Indicators (SLIs)
Service Level Objectives (SLOs)
Error Budgets
Incident Management
Reliability Engineering Practices
Start: Immediate to 15 Days
Location: Bangalore (Hybrid)
Company Description Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore we work closely with our clients to build solutions that fit their needs - from software development AI and infrastructur...
Company Description
Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore we work closely with our clients to build solutions that fit their needs - from software development AI and infrastructure engineering to industrial automation and embedded systems.
We combine strong technical expertise with a practical business-focused approach to help organizations modernize improve security and scale with confidence. Above all we focus on long-term partnerships built on trust quality and real results.
With us you have great opportunities to take real steps in your career and the opportunity to take great responsibility.
About the Role
Company: Aqilea India
Role : Site Reliability Engineer(SRE)
Exp : 5 to 10 years
Location : Bangalore(Hybrid)
Job Summary
We are seeking an experienced Site Reliability Engineer (SRE) to join our cross-functional product team and drive operational excellence reliability and performance across our eCommerce platforms. The ideal candidate will possess strong expertise in SRE principles DevOps practices cloud technologies and production support within a microservices-based architecture. This role focuses on ensuring application stability proactive monitoring incident management automation and continuous improvement of platform reliability.
Key Responsibilities
Work within cross-functional product teams as the reliability expert for assigned products or product areas.
Apply Site Reliability Engineering practices and standards in collaboration with SRE governance teams.
Ensure high-quality service delivery and provide operational KPI reporting.
Collaborate closely with product teams to maintain predictable operations and minimize production disruptions.
Drive continuous improvement initiatives by sharing best practices and enhancing operational processes.
Monitor manage troubleshoot and resolve application and infrastructure issues across production environments.
Perform technical analysis and root cause investigations for complex production incidents.
Improve system reliability through proactive monitoring alerting and preventive measures.
Analyze application code and logs to identify opportunities for product and operational improvements.
Develop automation solutions for monitoring housekeeping activities and incident prevention.
Ensure application and environment stability availability and performance.
Automate development and operational processes using scripting and infrastructure automation tools.
Participate in on-call support rotations and resolve business-critical incidents within SLA targets.
Define and track reliability metrics including SLIs SLOs and Error Budgets.
Contribute to performance engineering and application reliability initiatives.
Required Skills & Qualifications
Technical Expertise
Minimum 5 years of experience in Site Reliability Engineering Production Support Operations DevOps or Software Development.
Strong experience supporting and operating eCommerce platforms.
Hands-on experience with DevOps practices including CI/CD automated testing and release automation.
Experience troubleshooting complex distributed systems and microservices-based architectures.
Strong understanding of solution architecture and root cause analysis techniques.
Experience working with API-driven frameworks such as commerce toolsFabric or similar platforms.
Experience with ITIL processes and ITSM tools such as ServiceNow.
Knowledge of application reliability and performance engineering principles.
Experience supporting web desktop and mobile applications.
Cloud & Infrastructure
Hands-on experience with cloud platforms such as Microsoft Azure and/or Google Cloud Platform (GCP).
Experience with managed Kubernetes services such as AKS and/or GKE.
Experience provisioning and managing infrastructure using Terraform and/or Ansible.
Knowledge of cloud-native architecture scalability and reliability best practices.
Development & Automation
Proficiency in at least one programming language:
Python
Java
C#
Go
Ruby
Experience with GitHub Actions for CI/CD workflow development.
Familiarity with Azure DevOps and other deployment automation platforms.
Understanding of front-end technologies such as ReactJSReact Native and is advantageous.
Monitoring & Reliability
Hands-on experience with observability and monitoring tools such as: