Neuron Collectives Software Engineer, Trainium Collectives

Amazon

Not Interested
Bookmark
Report This Job

profile Job Location:

Cupertino, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 days ago
Vacancies: 1 Vacancy

Department:

Software Development

Job Summary

As a Neuron Collectives Software Developer you will:

* Enhance collective algorithms and topologies for optimal training performance
* Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization
* Monitor and analyze processor DMA firmware and workload metrics
* Optimize collective operations to scale AI compute across the data center
* Work closely with the hardware team to co-optimize software and Trainium silicon
* Develop and optimize C/C implementations of collective communication patterns
* Investigate and implement improvements for specific training topologies used by modern LLMs
* Build and maintain analysis frameworks and automation solutions

The role offers opportunities to work on cutting-edge AI training hardware while contributing to one of Amazons most critical initiatives.


A day in the life
Annapurna Labs a crucial part of AWS is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.

We have mixed discipline orgs youd be working side by side with infrastructure experts hardware engineers RTL engineers scientists & architects. Our workforce spans the globe and is truly international youll find yourself working side by side with individuals from numerous countries. We take mentorship seriously you can both expect senior mentorship and will be expected to mentor new and junior engineers.

The pace is fast as we work on the latest advancements of AI/ML but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning the AI/ML field is fast moving and constantly evolving.

About the team
Annapurna Labs part of AWS created Trainium as a purpose-built AI training chip to revolutionize machine learning at Amazon scale. The Neuron Collectives team owns the software stack that enables collective operations the communication primitives that allow AI training to scale across thousands of chips in the data center. Our work is essential to training the frontier models that power AI today. We work closely with hardware teams to extract maximum performance from Trainium ensuring that compute and interconnect bandwidth are fully utilized. Our team sits at the intersection of hardware firmware and distributed systems.

- 3 years of non-internship professional software development experience
- 2 years of non-internship design or architecture (design patterns reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language

- 3 years of full software development life cycle including coding standards code reviews source control management build processes testing and operations experience
- Bachelors degree in computer science or equivalent

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees supervisors and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees supervisors and staff to ensure exceptional customer service; and follow all federal state and local laws and Company policies. Criminal history may have a direct adverse and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above as well as the abilities to adhere to company policies exercise sound judgment effectively manage stress and work safely and respectfully with others exhibit trustworthiness and professionalism and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience qualifications and location. Amazon also offers comprehensive benefits including health insurance (medical dental vision prescription Basic Life & AD&D insurance and option for Supplemental life plans EAP Mental Health Support Medical Advice Line Flexible Spending Accounts Adoption and Surrogacy Reimbursement coverage) 401(k) matching paid time off and parental leave. Learn more about our benefits at CA Cupertino - 165200.00 - 223600.00 USD annually


Required Experience:

IC

As a Neuron Collectives Software Developer you will:* Enhance collective algorithms and topologies for optimal training performance* Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization* Monitor and analyze processor DMA firmware and workload metrics* Optim...
View more view more

About Company

Company Logo

Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more

View Profile View Profile