Software engineer -AI/ML, AWS Neuron Inference, AWS Neuron Inference

AWS Neuron

Not Interested
Bookmark
Report This Job

profile Job Location:

Seattle - USA

profile Yearly Salary: $ 129300 - 223600
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Department:

Software Development

Job Summary

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine
learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention MLP Quantization Speculative Decoding Mixture of Experts etc.

The team works side by side with chip architects compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B 3.1 405B DBRX Mixtral and so on.

Key job responsibilities
Responsibilities of this role include adapting latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models. Working across teams and organizations is key.

About the team
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures and were building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough but kind code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.

- 3 years of non-internship professional software development experience
- 2 years of non-internship design or architecture (design patterns reliability and scaling) of new and existing systems experience
- Programming proficiency in Python or C (at least one required)
- Experience with PyTorch
- Working knowledge of Machine Learning and LLM fundamentals including transformer architecture training/inference lifecycles and optimization techniques
- Strong understanding of system performance memory management and parallel computing principles

- Experience with JAX
- Experience with debugging profiling and implementing software engineering best practices in large-scale systems
- Expertise with PyTorch JIT compilation and AOT tracing
- Experience with CUDA kernels or equivalent ML/low-level kernels
- Experience with performant kernel development (e.g. CUTLASS FlashInfer)
- Experience with inference serving platforms (vLLM SGLang TensorRT) in production environments
- Deep understanding of computer architecture operating systems and parallel computing

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129300/year in our lowest geographic market up to $223600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge skills and experience. Amazon is a total compensation company. Dependent on the position offered equity sign-on payments and other forms of compensation may be provided as part of a total compensation package in addition to a full range of medical financial and/or other benefits. For more information please visit
This position will remain posted until filled. Applicants should apply via our internal or external career site.

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core build...
View more view more

Key Skills

  • Spring
  • .NET
  • C/C++
  • Go
  • React
  • OOP
  • C#
  • Data Structures
  • JavaScript
  • Software Development
  • Java
  • Distributed Systems

About Company

Company Logo

Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more

View Profile View Profile