System Software Engineer, First-Party Hardware
San Francisco, CA - USA
Job Summary
About the Team
OpenAIs Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI addition to delivering production-grade silicon for OpenAIs supercomputing infrastructure the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.
About the Role
Were seeking a System Software Engineer to join our First-Party Hardware this role you will design build integrate and validate low-level system software for the manageability and health of OpenAIs first-party AI hardware systems.
You will work across BMC Linux firmware interfaces automation infra boot and recovery hardware diagnostics telemetry host and platform drivers network software interfaces and manufacturing and fleet readiness. A major part of this role is owning the acceptance path for partner-delivered system software: defining requirements reviewing code and artifacts reproducing builds building tests pushing fixes and producing the evidence needed for launch decisions.
This role is hands-on and high-ownership. You will write and review low-level software debug issues across hardware and software boundaries build infra and automation to test and manage devices in lab guide partner deliverables build validation evidence and help carry platforms from bring-up through production deployment.
Location: San Francisco CA (Hybrid: 3 days/week onsite)
Relocation assistance available.
In this role you will:
Design develop and maintain low-level firmware and system software for first-party AI hardware manageability including BMC software Redfish services gNMI telemetry firmware update and recovery flows BIOS/UEFI interactions platform drivers and hardware diagnostics.
Own integration and acceptance of partner and vendor software releases including requirements code and artifact review reproducible builds CI regression monitoring version tracking acceptance criteria and launch-readiness evidence.
Build and maintain automation and CI infra for testing and managing systems in our lab
Define and debug hardware management protocols across accelerators host systems management controllers firmware and platform services including interfaces such as I2C SMBus PMBus PCIe Ethernet GPIO UART and JTAG.
Build system health monitoring telemetry remote diagnostics and recovery paths that make hardware failures diagnosable in the lab at manufacturing partners and in production data centers.
Develop validation and test automation for board bring-up rack bring-up qualification manufacturing readiness deployment readiness and long-term reliability.
Convert engineering releases into manufacturing-ready software recipes: images versions logs limits remediation mapping provisioning hooks secure artifact handling and traceable data export.
Debug complex production issues spanning hardware signals BMC firmware BIOS/UEFI kernel drivers platform services network topology PCIe behavior power thermals boot provisioning and manufacturing test.
Partner with hardware firmware security networking infrastructure manufacturing operations and external engineering teams to define software contracts unblock bring-up and drive issues to closure.
Produce durable architecture notes runbooks validation records and decision documents that help OpenAI and partner teams reproduce operate and improve the platform.
You might thrive in this role if you:
7 years of hands-on experience or exceptional accomplishments demonstrating equivalent expertise in low-level system software embedded software firmware BMC software platform software device drivers or hardware diagnostics.
Strong programming skills in C C Rust or similar systems languages with experience building reliable software for real hardware.
Experience with Linux-based hardware platforms embedded Linux OpenBMC Redfish BMCWeb IPMI boundaries BIOS/UEFI bootloaders firmware update systems kernel drivers RTOS or fleet management software.
Strong knowledge of hardware/software interfaces such as I2C SMBus PMBus SPI PCIe Ethernet USB UART GPIO JTAG power controllers board-level debug tools or protocol analyzers.
Demonstrated ability to debug live hardware using logs packet captures firmware traces bus captures lab hosts BMC journals Linux tooling and carefully controlled experiments.
Experience with hardware bring-up manufacturing or qualification testing system diagnostics release validation or deployment of high-performance compute accelerator server networking storage or embedded platforms.
Ability to reason across software firmware hardware manufacturing and operations boundaries and to turn ambiguous problems into clear requirements designs tests and decisions.
Proven track record working with external vendors manufacturing partners or partner engineering teams to define deliverables review technical work and drive issues to closure.
Familiarity with platform security topics such as secure boot firmware signing device provisioning attestation certificate handling trusted update flows or access-control design is a plus.
To comply with U.S. export control laws and regulations candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core and to achieve our mission we must encompass and value the many different perspectives voices and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and we do not discriminate on the basis of race religion color national origin sex sexual orientation age veteran status disability genetic information or other applicable legally protected characteristic.
For additional information please see OpenAIs Affirmative Action and Equal Employment Opportunity Policy Statement.
Background checks for applicants will be administered in accordance with applicable law and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws including the San Francisco Fair Chance Ordinance the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct adverse and negative relationship with the following job duties potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary confidential and non-public addition job duties require access to secure and protected information technology systems and related data security obligations.
To notify OpenAI that you believe this job posting is non-compliant please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.
We are committed to providing reasonable accommodations to applicants with disabilities and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI we believe artificial intelligence has the potential to help people solve immense global challenges and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Required Experience:
IC
About Company
We believe our research will eventually lead to artificial general intelligence, a system that can solve human-level problems. Building safe and beneficial AGI is our mission.