Senior Associate Cloud SRE Azure & AWS

Datavail Infotech

Not Interested
Bookmark
Report This Job

profile Job Location:

Mumbai - India

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

Job Title: Senior Associate Cloud SRE

Education: Any Graduate

Experience: 5 to 10 years

Location: Mumbai

Overview

We are seeking a Site Reliability Engineer to deliver tier two cloud operations managed services support across AWS and Azure environments. This role combines advanced troubleshooting and operational excellence with proactive reliability engineering focusing on maintaining 24x7x365 service availability while continuously improving automation and operational efficiency across multi-cloud infrastructure.

Role Summary

As a Site Reliability Engineer supporting multi-cloud infrastructure (AWS and Azure) you will manage complex operational challenges and escalations while implementing reliability best practices across production systems. You will work collaboratively with customer teams and senior engineers to ensure system stability automate operational workflows and maintain comprehensive observability. This is a delivery-focused role requiring both advanced technical execution and operational ownership across cloud platforms.

Primary Responsibilities

Multi-Cloud Operations & Managed Services

AWS Operations:

  • Provide 24x7x365 tier two support and escalation handling for AWS environments

  • Execute complex operational tasks including:

  • Patching and managing Amazon Machine Images (AMIs)

  • Creating and configuring EC2 instances and RDS databases

  • Managing IAM roles users and policies

  • Configuring S3 bucket policies and Access Control Lists (ACLs)

  • Opening and managing network routes (VPC subnets security groups)

  • Restoring snapshots and database backups to lower environments

  • Increasing disk sizes (EBS volumes) and managing storage optimization

  • Implementing proper tagging for environment identification and cost allocation

  • Managing logs archiving using CloudWatch Logs and S3

Azure Operations:

  • Provide equivalent tier two support for Azure cloud environments

  • Execute Azure-specific operational tasks including:

  • Managing and updating Azure Virtual Machine images

  • Creating and configuring Azure Virtual Machines and Azure SQL databases

  • Managing Azure Active Directory (AAD) identities roles and role-based access control (RBAC)

  • Configuring Azure Storage account policies and access controls

  • Managing Virtual Networks Network Security Groups (NSGs) and route tables

  • Restoring VM snapshots and database backups to lower environments

  • Managing disk resizing and Azure Managed Disks optimization

  • Implementing Azure resource tagging and cost management

  • Managing log archiving using Azure Monitor and Log Analytics

Cross-Cloud Responsibilities:

  • Handle escalations from tier one support with deep technical analysis across both platforms

  • Provide root cause analysis for complex incidents in multi-cloud environments

  • Implement consistent operational standards across AWS and Azure

  • Support hybrid cloud connectivity and integration scenarios

Reliability & Incident Management

  • Implement and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) across AWS and Azure in collaboration with senior engineers and customer stakeholders

  • Lead tier two incident response performing advanced troubleshooting and resolution on both cloud platforms

  • Conduct thorough post-incident analysis with actionable remediation plans

  • Reduce reactive work by improving runbooks alert configurations and standard operating procedures for both clouds

  • Apply reliability engineering best practices with oversight and review

  • Mentor tier one engineers during incident response across multi-cloud scenarios

Automation & Infrastructure as Code

  • Build and maintain CI/CD pipelines for infrastructure and application deployments on AWS and Azure

  • Automate complex operational tasks including patching backups and environment provisioning across both platforms

  • Develop infrastructure automation using Terraform for multi-cloud environments

  • Create sophisticated scripts and tooling to eliminate manual toil and improve operational efficiency

  • Implement Azure Resource Manager (ARM) templates or Bicep for Azure-specific automation

  • Follow established patterns and contribute continuous improvements

  • Document automation processes for knowledge sharing across cloud platforms

Containerization & Deployment

  • Deploy and operate containerized workloads using Docker on AWS services (ECS EKS) and Azure services (AKS Azure Container Instances)

  • Support container reliability through proper health checks autoscaling configurations and resource management on both platforms

  • Implement safe deployment patterns (canary deployments blue/green deployments) across AWS and Azure

  • Troubleshoot complex containerization and orchestration issues in multi-cloud Kubernetes environments

  • Follow and enhance established containerization standards across both cloud providers

Observability & Performance

  • Configure and maintain comprehensive monitoring logging and alerting systems across AWS CloudWatch and Azure Monitor

  • Leverage observability data to identify issues and lead root cause analysis in multi-cloud environments

  • Contribute to performance tuning and cost optimization initiatives across both platforms

  • Ensure proper instrumentation and telemetry across AWS and Azure environments

  • Identify patterns and trends to prevent future incidents

  • Build custom dashboards and reports using CloudWatch Azure Monitor and third-party tools (Datadog Grafana)

Collaboration & Customer Engagement

  • Work closely with customer development and operations teams to improve system operability across cloud platforms

  • Participate in design reviews and reliability assessments for multi-cloud architectures

  • Communicate technical concepts tradeoffs and recommendations clearly to stakeholders

  • Provide regular operational updates and service reports covering both AWS and Azure

  • Act as technical liaison between customers and internal engineering teams

Required Qualifications

Experience

  • 35 years of hands-on experience in DevOps SRE or production operations roles

  • Proven experience operating production systems in AWS OR Azure (deep expertise in one required)

  • Working knowledge or exposure to the secondary cloud platform (ability to learn and support)

  • Demonstrated experience managing containerized applications in production

  • Experience delivering managed services or supporting customer-facing infrastructure

  • Track record of handling complex technical escalations in cloud environments

  • Technical Skills - Primary Cloud Platform (AWS OR Azure)

For AWS-Primary Candidates:

  • AWS Services (Expert): Deep knowledge of EC2 RDS S3 IAM VPC CloudWatch Lambda and related services

  • AWS Networking (Expert): Strong experience with VPCs subnets security groups route tables and VPN/Direct Connect

  • AWS Storage (Expert): Proficiency with EBS S3 and backup/restore strategies

  • AWS Containers (Expert): Hands-on experience with ECS EKS or Fargate

  • Azure (Foundational): Basic understanding of Azure services with willingness to learn; exposure to Azure VMs Storage or networking is a plus

For Azure-Primary Candidates:

  • Azure Services (Expert): Deep knowledge of Azure VMs Azure SQL Storage Accounts Azure AD Virtual Networks Azure Monitor

  • Azure Networking (Expert): Strong experience with VNets NSGs Application Gateway Azure Firewall and ExpressRoute

  • Azure Storage (Expert): Proficiency with Managed Disks Blob Storage and Azure Backup

  • Azure Containers (Expert): Hands-on experience with AKS (Azure Kubernetes Service) and Azure Container Instances

  • AWS (Foundational): Basic understanding of AWS services with willingness to learn; exposure to EC2 S3 or VPC is a plus

Technical Skills - Cross-Platform (All Candidates)

  • Infrastructure as Code: Proficiency with Terraform (preferred) or CloudFormation/ARM templates

  • CI/CD: Experience building and maintaining automated deployment pipelines (Azure DevOps GitHub Actions Jenkins GitLab CI)

  • Scripting/Programming: Proficiency in Python PowerShell Bash or similar languages

  • Containerization: Strong Docker skills and Kubernetes experience

  • Monitoring & Logging: Experience with cloud-native monitoring tools and/or third-party observability platforms (Datadog Splunk ELK Grafana)

  • Version Control: Proficiency with Git and collaborative development workflows

  • Troubleshooting: Advanced diagnostic and problem-solving capabilities

Operational Capabilities

  • Experience with 24x7 operations and tier two escalation support

  • Strong troubleshooting and root cause analysis skills

  • Understanding of networking concepts security best practices and compliance requirements

  • Familiarity with backup/restore procedures and disaster recovery planning

  • Ability to work under pressure during critical incidents

  • Experience coordinating across distributed teams

  • Willingness and ability to quickly learn the secondary cloud platform

Preferred Qualifications

Certifications

  • AWS Certifications (for AWS-primary): Solutions Architect Associate SysOps Administrator or DevOps Engineer Professional

  • Azure Certifications (for Azure-primary): Azure Administrator Associate (AZ-104) or Azure Solutions Architect Expert (AZ-305)

  • Cloud-agnostic certifications (Terraform Associate CKA or SRE Foundation)

Additional Preferred Experience

  • Any hands-on experience with both AWS and Azure (even if limited in one)

  • Experience with Kubernetes in production environments

  • Prior consulting or managed services provider experience

  • Experience with hybrid cloud or cloud migration projects

  • Experience with configuration management tools (Ansible Chef Puppet)

  • Knowledge of security and compliance frameworks (HIPAA SOC 2 PCI-DSS)

  • Experience in high-traffic or mission-critical industries

  • Experience with cost optimization and FinOps practices

  • Multi-cloud architecture or implementation experience




Required Experience:

Senior IC

DescriptionJob Title: Senior Associate Cloud SREEducation: Any GraduateExperience: 5 to 10 yearsLocation: MumbaiOverviewWe are seeking a Site Reliability Engineer to deliver tier two cloud operations managed services support across AWS and Azure environments. This role combines advanced troubleshoot...
View more view more

About Company

Company Logo

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leadi ... View more

View Profile View Profile