Systems Development Engineer, Research Compute Platform, Fauna
New York City, NY - USA
Job Summary
This role requires both strong systems engineering fundamentals and genuine comfort working alongside researchers. The ideal candidate is as happy diagnosing a GPU thermal fault as they are designing a job scheduler and treats the scientists training run just works as the north star for everything they build.
Key job responsibilities
- Own on-prem GPU compute end-to-end: provisioning imaging driver and CUDA management monitoring failure diagnosis hardware RMA and capacity planning
- Build and operate a job scheduling layer (Slurm Ray SkyPilot or equivalent) so scientists submit training runs without managing individual machines
- Design and implement the bridge between on-prem and cloud compute
- Partner directly with ML scientists to triage training issues profile workloads identify bottlenecks and advise on how to structure training for the hardware at hand
About the team
Fauna Robotics an Amazon company is building capable safe and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces.
We believe that future wont arrive until building for robotics becomes far more accessible. Today too much effort is spent reinventing the fundamentals. Were changing that by developing tightly integrated hardware and software systems that make it faster safer and more intuitive to create real-world robotic products.
Our work spans the full stack: mechanical design control systems dynamic modeling and intelligent software. The focus is not just functionality but experience. Were building robots that feel responsive expressive and genuinely useful.
At Fauna youll work at the frontier of this space helping define how robots move manipulate and interact with people in natural environments. Its an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build.
If you care about making robotics real for everyone and building systems that are as delightful as they are capable were interested in hearing from you.
- 3 years of Linux systems administration experience
- 3 years of non-internship professional systems engineering or systems development experience
- Experience with configuration management and fleet automation (Ansible Chef or equivalent)
- Experience with containerization in production (Docker required; Kubernetes or containered exposure preferred)
- Proficiency in Python Go or Bash for systems tooling and automation
- Experience with NVIDIA GPU infrastructure: driver management CUDA versioning basic GPU diagnostics
- Experience with job schedulers or orchestrators (Slurm Ray SkyPilot Kubernetes with GPU operator or equivalent)
- Hardware comfort: diagnosing and replacing GPUs PSUs memory storage
- NVIDIA deep fluency: DCGM NVLink / PCIe topology IOMMU compute mode configuration
- Experience with GPU cloud providers (AWS p5/g6e RunPod Lambda CoreWeave) for hybrid on-prem/cloud workflows
- Track record of building internal platforms that accelerate other engineers or scientists
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience qualifications and location. Amazon also offers comprehensive benefits including health insurance (medical dental vision prescription Basic Life & AD&D insurance and option for Supplemental life plans EAP Mental Health Support Medical Advice Line Flexible Spending Accounts Adoption and Surrogacy Reimbursement coverage) 401(k) matching paid time off and parental leave. Learn more about our benefits at NY New York - 142300.00 - 192400.00 USD annually
Required Experience:
IC
About Company
Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more