Job Summary
You will design implement and maintain Valuwares cloud infrastructure on AWS/GCP using Kubernetes. Youll build CI/CD pipelines set up monitoring (Prometheus Grafana Loki) manage database backups and disaster recovery and ensure 99.95% uptime across all 28 modules.
Key Responsibilities / Duties
Infrastructure
Design and manage Kubernetes clusters (EKS/GKE) with multiple node pools
Implement Infrastructure as Code using Terraform
Manage cloud resources (EC2 RDS S3 MSK ElastiCache)
Configure VPC networking security groups and load balancers
Implement auto-scaling policies (HPA Karpenter)
CI/CD
Build and maintain CI/CD pipelines (GitHub Actions ArgoCD)
Implement blue/green and canary deployment strategies
Manage Helm charts for all microservices
Automate rollback procedures
Monitoring and amp; Observability
Set up Prometheus for metrics collection
Configure Grafana dashboards for system health
Implement Loki for log aggregation
Set up distributed tracing with Jaeger/Tempo
Configure alerting (PagerDuty Opsgenie)
Security and amp; Compliance
Implement secrets management with HashiCorp Vault
Configure network policies and service meshes (Istio)
Perform container vulnerability scanning (Trivy)
Implement backup and disaster recovery (RTO and lt; 4 hours RPO and lt; 5 minutes)
Required Skills and amp; Qualifications
Must-Have (5 years overall)
SkillProficiencyNotes
Kubernetes3 yearsEKS/GKE Helm HPA
AWS3 yearsEC2 RDS S3 EKS MSK
Terraform2 yearsInfrastructure as Code
CI/CD3 yearsGitHub Actions ArgoCD
Prometheus/Grafana2 yearsMonitoring stack
Docker4 yearsContainerization
Linux5 yearsShell scripting
Preferred / Good-to-Have Skills
SkillWhy It Matters
Istio / LinkerdService mesh
Loki / TempoLogging and tracing
VaultSecrets management
Trivy / FalcoContainer security
Python / GoAutomation scripting
CloudflareCDN WAF
SLO / SLA Targets
API Gateway: and nbsp;99.99%
Listing Service: and nbsp;99.95%
Transaction Service: and nbsp;99.99%
AI Service: and nbsp;99.90%
Database: and nbsp;99.99%
Interview Process
Round 1 Kubernetes Docker fundamentals: and nbsp;60 minutes
Round 2 AWS Terraform: and nbsp;60 minutes
Round 3 Monitoring Incident response: and nbsp;45 minutes
Round 4 Hiring Manager: and nbsp;45 minutes
Job SummaryYou will design implement and maintain Valuwares cloud infrastructure on AWS/GCP using Kubernetes. Youll build CI/CD pipelines set up monitoring (Prometheus Grafana Loki) manage database backups and disaster recovery and ensure 99.95% uptime across all 28 modules.Key Responsibilities / Du...
Job Summary
You will design implement and maintain Valuwares cloud infrastructure on AWS/GCP using Kubernetes. Youll build CI/CD pipelines set up monitoring (Prometheus Grafana Loki) manage database backups and disaster recovery and ensure 99.95% uptime across all 28 modules.
Key Responsibilities / Duties
Infrastructure
Design and manage Kubernetes clusters (EKS/GKE) with multiple node pools
Implement Infrastructure as Code using Terraform
Manage cloud resources (EC2 RDS S3 MSK ElastiCache)
Configure VPC networking security groups and load balancers
Implement auto-scaling policies (HPA Karpenter)
CI/CD
Build and maintain CI/CD pipelines (GitHub Actions ArgoCD)
Implement blue/green and canary deployment strategies
Manage Helm charts for all microservices
Automate rollback procedures
Monitoring and amp; Observability
Set up Prometheus for metrics collection
Configure Grafana dashboards for system health
Implement Loki for log aggregation
Set up distributed tracing with Jaeger/Tempo
Configure alerting (PagerDuty Opsgenie)
Security and amp; Compliance
Implement secrets management with HashiCorp Vault
Configure network policies and service meshes (Istio)
Perform container vulnerability scanning (Trivy)
Implement backup and disaster recovery (RTO and lt; 4 hours RPO and lt; 5 minutes)
Required Skills and amp; Qualifications
Must-Have (5 years overall)
SkillProficiencyNotes
Kubernetes3 yearsEKS/GKE Helm HPA
AWS3 yearsEC2 RDS S3 EKS MSK
Terraform2 yearsInfrastructure as Code
CI/CD3 yearsGitHub Actions ArgoCD
Prometheus/Grafana2 yearsMonitoring stack
Docker4 yearsContainerization
Linux5 yearsShell scripting
Preferred / Good-to-Have Skills
SkillWhy It Matters
Istio / LinkerdService mesh
Loki / TempoLogging and tracing
VaultSecrets management
Trivy / FalcoContainer security
Python / GoAutomation scripting
CloudflareCDN WAF
SLO / SLA Targets
API Gateway: and nbsp;99.99%
Listing Service: and nbsp;99.95%
Transaction Service: and nbsp;99.99%
AI Service: and nbsp;99.90%
Database: and nbsp;99.99%
Interview Process
Round 1 Kubernetes Docker fundamentals: and nbsp;60 minutes
Round 2 AWS Terraform: and nbsp;60 minutes
Round 3 Monitoring Incident response: and nbsp;45 minutes
Round 4 Hiring Manager: and nbsp;45 minutes
View more
View less