Principal Firmware Engineer, Annapurna Labs ML Acceleration Systems Software
Austin, TX - USA
Department:
Job Summary
Our team designs and builds Annapurnas fleet of Accelerated Servers using Internally designed silicon. We solve systemic hardware issues and we build hardware and software systems to detect and mitigate future failure recurrences so that our our customers can experience the highest quality of service possible!
In this role you will lead an organization of software and firmware developers to build reliable server firmware deployed across millions of accelerators across EC2. You will build AI-driven software tooling that root causes failures and identifies causes of system failureswork that directly impacts how our customers leverage AWS Trainium for their machine learning workloads.
Key job responsibilities
In this role you will lead a team of software and firmware developers to design and develop server software at AWS scale. Youll collaborate with hardware developers and software engineers to design validation strategies that ensure reliability across our entire product line. Your days will include mentoring your team through complex technical challenges establishing operational procedures that scale across products and working cross-functionally to integrate design-for-excellence principles into our development process. Youll also participate in technical discussions that shape how we approach system design & validation ensuring were catching issues before they reach customers.
This is a fast-paced intellectually challenging position and youll work with thought leaders in multiple technology areas. Youll have high standards for yourself and everyone you work with and youll be constantly looking for ways to improve your products performance quality and cost. Using data and key metrics you will also drive and measure process improvements that enhance our operational effectiveness.
A day in the life
Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and facilitate system development ontop of your server design. You will be responsible for learning operational challenges to our existing fleet with the goal of improving the current customer experience as well as developing improved systems for future designs. You will work directly with vendors and ODM/JDM design teams to develop and manufacture your product at scale.
About the team
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures and were building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough but kind design reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
Were a collaborative group of software engineers and hardware developers united by a shared mission: making Amazon Trainium products more reliable and easier to troubleshoot. Our team values partnership across disciplinesyour success depends on building strong relationships with hardware specialists validation engineers and other technical leaders. Were focused on establishing best-in-class operational procedures and diagnostic capabilities that set the standard for the industry. By joining us youll help shape the future of how we approach system reliability and contribute to products that power some of the most demanding machine learning applications in the world.
- 7 years of working directly with engineering teams experience
- Experience managing programs across cross functional teams building processes and coordinating release schedules
- Experience building and evaluating system-level technical design
- Bachelors degree in Computer Science Computer Engineering or related fields
- Experience managing teams or experience as a mentor tech lead or leading an engineering team
- Experience in software development or experience troubleshooting and debugging technical systems and experience that includes strong analytical skills attention to detail and effective communication abilities
- Experience with hardware/software integration and real-time systems
- 10 years of systems software or firmware engineering
- Proficiency with programming languages commonly used in systems software (such as C C Rust or Python)
- 5 years of project management disciplines including scope schedule budget quality along with risk and critical path management experience
- Experience managing projects across cross functional teams building sustainable processes and coordinating release schedules
- Experience defining KPIs/SLAs used to drive multi-million dollar businesses and reporting to senior leadership
- Masters degree in Computer Science Computer Engineering or related fields
- Experience troubleshooting and debugging technical systems
- 5 years of embedded firmware development experience
- Knowledge of data center infrastructure design operations or delivery
- Experience navigating a knowledge base and following Standard Operating Procedures (SOPs)
- Experience with AI or machine learning applications in systems engineering
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience qualifications and location. Amazon also offers comprehensive benefits including health insurance (medical dental vision prescription Basic Life & AD&D insurance and option for Supplemental life plans EAP Mental Health Support Medical Advice Line Flexible Spending Accounts Adoption and Surrogacy Reimbursement coverage) 401(k) matching paid time off and parental leave. Learn more about our benefits at TX Austin - 144100.00 - 194900.00 USD annually
Required Experience:
Staff IC
About Company
Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa Devices, sporting goods, toys, automotive ... View more