Job Title: Senior Specialist Cloud SRE
Education: Bachelors Degree
Experience: 8 years
Location: Mumbai
As a Senior SRE Engineer (Cloud SRE Specialist) you will be responsible for ensuring the reliability scalability performance and cost optimization of cloud services across AWS Azure and multi-cloud environments. You will act as the primary technical lead for assigned customers manage incident escalations drive automation-first practices and mentor junior engineers. You will also collaborate closely with development teams to embed resilience and observability into applications.
Key Responsibilities:
Customer Leadership & Collaboration:
Serve as the primary technical point of contact for assigned customer accounts.
Provide regular updates and lead initiatives to improve customer environments.
Be highly familiar with assigned accounts to make tactical decisions without escalation.
Collaborate with customer development teams to align infrastructure with application requirements.
Incident & Problem Management
Lead incident response and postmortems ensuring corrective and preventive measures.
Be the Tier 3 escalation point for offshore/onshore SRE teams.
Perform Root Cause Analysis (RCA) and validate work quality of Tier-2 engineers.
Develop and maintain incident response plans for security breaches and operational incidents.
Reliability Engineering:
Define and maintain SLIs/SLOs track error budgets and monitor alignment.
Participate in architecture discussions for high availability disaster recovery and scalability.
Integrate resilience patterns such as circuit breakers retries and bulkheading.
Use chaos engineering / fault injection practices where applicable.
Automation & Infrastructure as Code
Automate infrastructure and operations tasks using Terraform CloudFormation AWS CDK.
Build and maintain CI/CD pipelines with canary deployments and blue/green strategies.
Implement automation workflows with AWS Lambda Step Functions Azure Functions.
Monitoring & Observability
Implement observability systems: Prometheus Grafana OpenTelemetry ELK Jaeger.
Configure proactive monitoring and alerts using AWS CloudWatch / Azure Monitor.
Ensure visibility into metrics traces and logs for troubleshooting.
Cloud Infrastructure Management
Provision and manage VMs storage networking VPNs and ExpressRoute/Peering.
Manage patching backups encryption decryption and image management.
Optimize performance and cost via rightsizing autoscaling and reserved instances.
Manage identity and access controls (AWS IAM Azure AD RBAC).
Security & Compliance:
Implement and enforce security best practices across multi-cloud environments.
Ensure compliance with GDPR HIPAA and industry regulations.
Conduct regular audits and compliance reporting.
Mentoring & Knowledge Sharing
Coach and mentor Tier 2 and junior SREs.
Conduct reliability-focused design reviews.
Maintain up-to-date documentation runbooks and SOPs.
Required Experience:
Senior IC
Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leadi ... View more