SRE (JavaBackend)
Sunnyvale, CA - USA
Job Summary
Skills
Core Java
Advanced Java
Advanced Java 8
Amazon Web Services (AWS)
Amazon Web Services EKS (AWS EKS)
Kubernetes
DevOps / SRE
Key Responsibilities
- Architect and drive large-scale migrations of business-critical services to AWS and Kubernetes-based platforms
- Define and implement GitOps-first deployment strategies using ArgoCD with Spinnaker for advanced delivery workflows
- Design build and operate production-grade AWS EKS platforms at scale
- Establish best practices for CI/CD deployment automation and release strategies (blue/green canary progressive delivery)
- Design and maintain reusable Helm charts and standardized deployment patterns
- Develop and maintain Python-based tooling and automation for deployment operations and reliability
- Provide deep Linux systems expertise including performance tuning debugging and incident mitigation
- Own and support production systems including on-call participation incident response and root cause analysis
- Partner with SRE and Security teams to embed reliability scalability and security into platform design
- Drive architectural reviews author design documents and influence long-term platform and migration roadmaps
- Mentor engineers and raise the bar for DevOps and platform engineering practices
Minimum Qualifications
- 10 years of experience as a Cloud / DevOps / Platform Engineer supporting production systems
- Proven experience leading AWS migrations for large high-traffic business-critical platforms
- Strong hands-on expertise with:
- Linux systems (performance tuning networking troubleshooting)
- Python for automation tooling and operational workflows
- AWS (EKS VPC IAM EC2 ALB/NLB CloudWatch S3 RDS)
- Kubernetes (EKS) in production environments
- ArgoCD and GitOps deployment models
- Spinnaker for continuous delivery
- Helm for application packaging and release management
- Experience operating and supporting production environments with on-call responsibility
- Experience with Infrastructure as Code (Terraform and/or CloudFormation)
- Strong understanding of distributed systems networking and cloud security
- Ability to lead through influence and collaborate across engineering disciplines
Preferred Qualifications
- Familiarity with Akamai CDN caching strategies and edge delivery patterns
- Experience with Redis (caching replication high availability)
- Experience with Kafka or other distributed messaging systems
- Experience operating platforms at scale (hundreds of services multi-team environments)
- Experience with observability platforms (Prometheus Grafana OpenTelemetry Splunk Datadog)
- Familiarity with SRE practices including SLIs SLOs error budgets and incident response
- Experience with service mesh technologies (Istio Linkerd)
- Strong written and verbal communication skills including technical design documentation and executive-level discussions