Software Development Engineer (Elastic Kubernetes Service), EKS Scalability & Performance

Amazon


Job Location:

Seattle, OR - USA

Monthly Salary: Not Disclosed
Posted on: 10 days ago
Vacancies: 1 Vacancy

Department:

Software Development

Job Summary

We are looking for a Software Development Engineer to join the EKS KCP Scalability team and work on some of the hardest distributed systems problems at Amazon. You will design build and operate systems that directly determine whether EKS customers from startups to the largest AI/ML workloads on the planet experience a reliable performant control plane.

This is not a role where you implement features in isolation. You will work across the full stack: from the Kubernetes API server process and upstream community engagement through autoscaling services that right-size control planes in real time to the SLA measurement pipelines that hold us accountable to our customers. You will own systems end-to-end from design through production operations and your work will be measured by customer outcomes not lines of code.

Key job responsibilities
You will build and operate the Vertical Auto-Scaling Service (VAS) and its next-generation successor (VAS 2.0) which dynamically right-sizes EKS control planes by evaluating CPU/memory utilization etcd throttle rates node-count thresholds and network utilization simultaneously. You will work on the SLA measurement pipeline (MinutelySLA DailySLA MonthlySLA) that enforces EKSs uptime commitments investigating breaching clusters weekly and building automation to detect and mitigate degradation before customers notice.

You will contribute to the control plane architecture for EKS Ultraclusters defining how the API server etcd and associated components scale to support 100000-node clusters running generative AI workloads. You will maintain and extend version release qualification scale tests that gate every new Kubernetes version before it reaches customers. You will engage with the upstream Kubernetes community driving KEPs that work backwards from EKS customer requirements around performance scale and resiliency.

Depending on your interests and the teams priorities you may also work on workload identity systems (IRSA EKS Pod Identity) Cluster Access Management EC2 capacity management and grey failure detection or Large-Scale Event response and weight shifting.

About the team
The EKS KCP Scalability organization owns the performance availability and autoscaling of the Kubernetes control plane powering Amazon EKS from small development clusters to 100000-node Ultraclusters running generative AI workloads. We ensure every EKS cluster operates within its contracted SLA and delivers predictable high-performance behavior at any scale.

Our charter spans three domains: Performance Availability Autoscaling and Auth. We operate at the intersection of distributed systems Kubernetes internals and AWS infrastructure building systems that scale to hundreds of thousands of clusters globally.



- 3 years of non-internship professional software development experience
- 2 years of non-internship design or architecture (design patterns reliability and scaling) of new and existing systems experience
- 1 years of software development engineer or related occupational experience
- 1 years of designing and developing large-scale multi-tiered multi-threaded embedded or distributed software applications tools systems and services using: C# C Java or Perl experience
- 1 years of Object Oriented Design experience
- Bachelors degree or foreign equivalent in Computer Science Engineering Mathematics or a related field
- Experience programming with at least one software programming language

- 3 years of full software development life cycle including coding standards code reviews source control management build processes testing and operations experience
- Bachelors degree in computer science or equivalent

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience qualifications and location. Amazon also offers comprehensive benefits including health insurance (medical dental vision prescription Basic Life & AD&D insurance and option for Supplemental life plans EAP Mental Health Support Medical Advice Line Flexible Spending Accounts Adoption and Surrogacy Reimbursement coverage) 401(k) matching paid time off and parental leave. Learn more about our benefits at WA Seattle - 143700.00 - 194400.00 USD annually


Required Experience:

IC

We are looking for a Software Development Engineer to join the EKS KCP Scalability team and work on some of the hardest distributed systems problems at Amazon. You will design build and operate systems that directly determine whether EKS customers from startups to the largest AI/ML workloads on the...

About Company

Company Logo

Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more

View Profile View Profile