DevOps Site Reliability Engineer (SRE)

MavensTCL

Not Interested
Bookmark
Report This Job

profile Job Location:

Garhwa - India

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Summary
You will design implement and maintain Valuwares cloud infrastructure on AWS/GCP using Kubernetes. Youll build CI/CD pipelines set up monitoring (Prometheus Grafana Loki) manage database backups and disaster recovery and ensure 99.95% uptime across all 28 modules.

Key Responsibilities / Duties
Infrastructure
Design and manage Kubernetes clusters (EKS/GKE) with multiple node pools

Implement Infrastructure as Code using Terraform

Manage cloud resources (EC2 RDS S3 MSK ElastiCache)

Configure VPC networking security groups and load balancers

Implement auto-scaling policies (HPA Karpenter)

CI/CD

Build and maintain CI/CD pipelines (GitHub Actions ArgoCD)

Implement blue/green and canary deployment strategies

Manage Helm charts for all microservices

Automate rollback procedures

Monitoring and amp; Observability

Set up Prometheus for metrics collection

Configure Grafana dashboards for system health

Implement Loki for log aggregation

Set up distributed tracing with Jaeger/Tempo

Configure alerting (PagerDuty Opsgenie)

Security and amp; Compliance

Implement secrets management with HashiCorp Vault

Configure network policies and service meshes (Istio)

Perform container vulnerability scanning (Trivy)

Implement backup and disaster recovery (RTO and lt; 4 hours RPO and lt; 5 minutes)

Required Skills and amp; Qualifications

Must-Have (5 years overall)
SkillProficiencyNotes
Kubernetes3 yearsEKS/GKE Helm HPA
AWS3 yearsEC2 RDS S3 EKS MSK
Terraform2 yearsInfrastructure as Code
CI/CD3 yearsGitHub Actions ArgoCD
Prometheus/Grafana2 yearsMonitoring stack
Docker4 yearsContainerization
Linux5 yearsShell scripting

Preferred / Good-to-Have Skills
SkillWhy It Matters
Istio / LinkerdService mesh
Loki / TempoLogging and tracing
VaultSecrets management
Trivy / FalcoContainer security
Python / GoAutomation scripting
CloudflareCDN WAF

SLO / SLA Targets
API Gateway: and nbsp;99.99%
Listing Service: and nbsp;99.95%
Transaction Service: and nbsp;99.99%
AI Service: and nbsp;99.90%
Database: and nbsp;99.99%


Interview Process
Round 1 Kubernetes Docker fundamentals: and nbsp;60 minutes
Round 2 AWS Terraform: and nbsp;60 minutes
Round 3 Monitoring Incident response: and nbsp;45 minutes
Round 4 Hiring Manager: and nbsp;45 minutes

Job SummaryYou will design implement and maintain Valuwares cloud infrastructure on AWS/GCP using Kubernetes. Youll build CI/CD pipelines set up monitoring (Prometheus Grafana Loki) manage database backups and disaster recovery and ensure 99.95% uptime across all 28 modules.Key Responsibilities / Du...
View more view more