SoC Firmware Engineering Manager, Annapurna Labs Machine Learning Acceleration, AWS
Cupertino, CA - USA
Department:
Job Summary
Our SoC HAL (Hardware Abstraction Layer) team owns the lowest layer of user-space software on AWSs custom ML accelerator chips: the firmware that boots configures and manages every hardware block on the SoC. Your software runs as a shared library on embedded Linux reaching into the chip to program PCIe links initialize HBM controllers configure PLLs manage interrupt controllers and orchestrate fabric interconnects across 270 hardware block instances per chip all deployed across millions of servers in AWSs global fleet.
Tech stack: C17 CMake GoogleTest Python SystemVerilog DPI SPI APB/AXI bus protocols PCIe UCIe HBM PLL custom IPs
As the SoC Firmware Manager you will:
- Manage coach and grow a team of 6 engineers set technical direction own hiring and create an environment where strong engineers want to stay
- Coordinate deliverables across chip architects RTL designers verification engineers validation engineers and platform software teams youre the single point of accountability for HAL readiness on every new chip program
- Own bring-up for new SoC tape-outs from first-silicon power-on through production fleet deployment
- Prioritize work across multiple concurrent chip programs and customer teams balancing urgent bring-up needs against long-term architecture investments
- Drive the architecture of our C template metaprogramming framework BUTR (Built-in Unit Test for Registers) and HITL (Hardware-in-the-Loop) test infrastructure
- Ship the same C codebase to three execution environments: SystemVerilog DPI for chip verification QEMU for emulation and Carbon OS on embedded microcontrollers for production fleet
- Get into the weeds alongside your team debug register-level HW/SW interactions review code and write code yourself when it matters
Most firmware teams target one platform and ship to a few thousand units. We target three platforms from a single source tree and deploy across AWSs global fleet where a single register misconfiguration can impact millions of servers. Our software must be stateless survive live-updates on running production servers without reboots and be correct down to individual register bits. The microcontroller can reboot at any time including during customer workloads and the HAL must resume managing the SoC by querying hardware state on-demand. No cached state no assumptions.
Your pre-silicon software runs in simulation and emulation months before real silicon arrives. When the chip comes back from the fab you validate those predictions on real hardware and when they dont match you figure out whether its a silicon bug or a software bug. For Trainium3 our HAL enabled a full ML training workload within 12 hours of first power-on: ML background needed. Your firmware is the foundation that enables ML training across clusters of thousands of interconnected accelerators youll work on components like PCIe and HBM but wont need to understand ML itself.
This role can be based in Cupertino CA or Austin TX. The team is split between the two sites.
- 3 years of engineering team management experience
- 7 years of professional software development in C or C including embedded firmware or systems-level development
- 4 years of designing or architecting software systems (abstraction layers hardware/software interfaces)
- Experience developing software that interfaces directly with hardware: SoC ASIC FPGA or embedded microcontrollers
- Experience with register-level programming and hardware debug (waveform analysis bus-level tracing or similar)
- Experience in recruiting hiring mentoring/coaching and managing teams of Software Engineers to improve their skills and make them more effective product software engineers
- Experience with silicon bring-up or pre/post-silicon software validation
- Experience shipping software across multiple target platforms (simulation emulation production hardware)
- Familiarity with bus protocols (APB AXI PCIe) or memory subsystems (HBM DDR)
- Experience with C template metaprogramming or code generation frameworks
- Experience building or maintaining hardware abstraction layers or board support packages
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees supervisors and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees supervisors and staff to ensure exceptional customer service; and follow all federal state and local laws and Company policies. Criminal history may have a direct adverse and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above as well as the abilities to adhere to company policies exercise sound judgment effectively manage stress and work safely and respectfully with others exhibit trustworthiness and professionalism and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience qualifications and location. Amazon also offers comprehensive benefits including health insurance (medical dental vision prescription Basic Life & AD&D insurance and option for Supplemental life plans EAP Mental Health Support Medical Advice Line Flexible Spending Accounts Adoption and Surrogacy Reimbursement coverage) 401(k) matching paid time off and parental leave. Learn more about our benefits at CA Cupertino - 212700.00 - 287700.00 USD annually
USA TX Austin - 184900.00 - 250200.00 USD annually
Required Experience:
Manager
Key Skills
About Company
Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more