Senior Specialist Cloud SRE Azure, AKS & DevOps

Datavail Infotech

Not Interested
Bookmark
Report This Job

profile Job Location:

Mumbai - India

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

Job Title: Senior Specialist (SRE) - Azure AKS & DevOps

Education: Any Graduate

Experience: 8 to 15 years

Location: Mumbai

Key Skills:

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure AKS DevOps Automation and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability performance security patch compliance scalability and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers internal engineering teams and leadership to drive cloud transformation implement SRE best practices modernize DevOps delivery pipelines and improve measurable service outcomes.

Role Overview

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure AKS DevOps Automation and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability performance security patch compliance scalability and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers internal engineering teams and leadership to drive cloud transformation implement SRE best practices modernize DevOps delivery pipelines and improve measurable service outcomes.

Primary Responsibilities

Reliability Engineering & SRE Practices

  • Define and manage SLIs SLOs Error Budgets MTTR change failure rate and availability targets.

  • Continuously improve platform reliability scalability resilience and operational maturity.

  • Lead Sev-1 / Sev-2 incident management escalation handling and RCA reviews.

  • Conduct blameless postmortems and drive preventive actions.

  • Build operational runbooks self-healing automation and on-call processes.

  • Participate in architecture reviews for HA DR failover and performance optimization.

Azure Cloud Operations & Engineering

  • Manage enterprise Azure environments including:

  • Azure Virtual Machines

  • VM Scale Sets

  • Azure App Services

  • Azure Functions

  • Azure SQL / Managed Instance

  • Azure Storage

  • Virtual Networks / NSGs

  • Application Gateway / WAF

  • Azure Front Door

  • Load Balancers

  • Azure Backup & Site Recovery

  • Implement Azure Well-Architected Framework best practices.

  • Drive governance using Management Groups Policy RBAC Key Vault Defender for Cloud.

  • Optimize cost using Reserved Instances rightsizing budgets and tagging strategy.

AKS & Container Platform Engineering

  • Design manage and optimize Microsoft Azure Kubernetes Service (AKS) clusters.

  • Manage cluster upgrades autoscaling node pools ingress controllers storage classes and security policies.

  • Support container deployments using Helm YAML manifests GitOps workflows.

  • Improve AKS observability using Prometheus Grafana Azure Monitor for Containers.

  • Ensure platform reliability for microservices workloads.

DevOps CI/CD & Automation

  • Build and manage CI/CD pipelines using Azure DevOps GitHub Actions Jenkins or GitLab CI.

  • Implement blue/green rolling and canary deployments with rollback strategies.

  • Automate infrastructure using Terraform ARM Templates and Bicep.

  • Develop scripts/tools using PowerShell Bash Python Go.

  • Automate patching backup validation scaling compliance checks and recovery tasks.

  • Reduce manual operational toil through self-service automation.

Patching Security & Compliance

  • Own enterprise patch management for Windows/Linux workloads using Azure Update Manager.

  • Manage maintenance windows and zero-downtime patch strategies.

  • Implement CIS benchmark vulnerability remediation and audit compliance controls.

  • Secure workloads with Key Vault Private Link NSGs Conditional Access PIM Defender.

  • Support hybrid environments using Azure Arc-enabled servers.

Observability & Monitoring

  • Build and maintain monitoring platforms using:

  • Azure Monitor

  • Log Analytics

  • Application Insights

  • Grafana

  • Datadog

  • New Relic

  • Prometheus

  • Build executive dashboards SRE scorecards SLA reports capacity trends.

  • Tune alerts to reduce noise and improve actionable detection.

Customer Engagement & Leadership

  • Serve as primary technical contact for enterprise customers.

  • Present monthly service reviews patch compliance reliability metrics and improvement plans.

  • Mentor L1/L2 engineers and guide technical escalations.

  • Collaborate with customer architects security teams and developers.

  • Lead cloud modernization and operational excellence initiatives.

Required Qualifications

Experience

  • 8 -10 years in SRE DevOps Cloud Engineering or Production Operations.

  • Minimum 5 years hands-on with Microsoft Azure production environments.

  • Proven experience managing critical enterprise workloads.

  • Strong customer-facing / managed services background preferred.

Technical Skills

Azure

  • Deep expertise in Azure compute networking storage identity monitoring backup DR.

  • Strong hands-on with AKS Azure DevOps Azure Policy Key Vault.

  • DevOps / Automation

  • Terraform ARM Bicep CI/CD pipelines.

  • PowerShell Bash Python scripting.

  • Containers

  • Kubernetes Docker AKS operations.

  • Monitoring

  • Azure Monitor Grafana Datadog Prometheus Log Analytics.

  • Operations

  • Incident management RCA patching performance tuning DR drills.

Preferred Certifications

  • Microsoft AZ-104

  • AZ-305

  • AZ-400

  • AZ-500

  • Amazon Web Services Associate / Professional

  • CKA / Terraform Associate / SRE Foundation

Nice to Have

  • Multi-cloud (AWS / GCP) experience

  • Chaos engineering

  • FinOps knowledge

  • MSP / Managed Services experience

  • Large-scale enterprise operations

  • Security / Compliance frameworks (ISO 27001 SOC2 HIPAA PCI)




Required Experience:

Senior IC

DescriptionJob Title: Senior Specialist (SRE) - Azure AKS & DevOpsEducation: Any GraduateExperience: 8 to 15 yearsLocation: MumbaiKey Skills:We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure AKS DevOps Automation and Enterprise Operations to lead reliab...
View more view more

About Company

Company Logo

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leadi ... View more

View Profile View Profile