Senior Software Engineer (ML), Data Plane
Department:
Job Summary
Our work covers the full inference path: integrating serving engines with custom hardware developing high-performance compute kernels enabling efficient data movement and driving models from early validation through production. We operate at frontier scale with large distributed models.
This is a ground-up effort with rapidly evolving hardware and software. We need a senior IC who can write and optimize low-level code for custom hardware validate model architectures end-to-end build test and profiling infrastructure and drive performance across the stack.
Key job responsibilities
- Develop and optimize compute kernels for a custom ML accelerator architecture targeting production-level performance for large language model inference.
- Implement and validate LLM architectures (decoder-only mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.
- Integrate custom accelerator backends into open-source ML serving frameworks (vLLM PyTorch) including scheduler extensions memory management and model parallelism.
- Build and maintain test infrastructure for model correctness validation across CPU GPU simulator and hardware targets.
- Profile and optimize inference workloads - identify bottlenecks instrument critical paths and drive latency and throughput improvements from simulation through hardware bringup.
- Own features end-to-end: from design through implementation testing and integration into the broader software stack.
- Contribute to CI/CD pipelines that gate model and kernel changes on correctness and performance regressions.
- Mentor engineers drive design reviews and raise the engineering bar across the team.
- Bachelors degree in computer science or equivalent
- 7 years of full software development life cycle including coding standards code reviews source control management build processes testing and operations experience
- Knowledge of Machine Learning and LLM fundamentals including transformer architecture training/inference lifecycles and optimization techniques
- Knowledge of computer architecture operating systems and parallel computing
- Strong proficiency in C/C
- Strong Linux systems knowledge
- Experience developing compute kernels for GPUs DSPs or custom accelerators
- Proven track record of owning and delivering complex software features end-to-end
- Knowledge of ML frameworks including JAX PyTorch vLLM SGLang Dynamo TorchXLA and TensorRT
- Experience in developing and deploying LLMs in production on GPUs Neuron TPU or other AI acceleration hardware or experience with CUDA kernels or ML/low-level kernels
- Familiarity with speculative decoding KV cache optimization or other LLM serving optimizations
- Experience with distributed systems - collective communication RDMA or high-speed interconnect programming
- Experience with hardware simulation environments and model validation workflows
- Demonstrated early adopter of AI-assisted development tools - uses LLMs or code-generation agents as part of daily workflow
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.
Required Experience:
Senior IC
About Company
Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more