Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailUSD 440000 - 685000
1 Vacancy
As a Networking Engineer focused on WAN and LAN you will play a critical role in developing managing and optimizing the front end network components of OpenAIs supercomputing infrastructure.
Your expertise will ensure that our networks are fast reliable and scalable to meet the demands of training frontier AI models.
This includes managing both local (LAN) and longdistance (WAN) connectivity across our data centers optimizing performance and ensuring seamless communication between compute nodes and clusters. Finally this also includes writing code to instrument and observe the network.
Our team primarily uses Python and some Rust so familiarity with or interest in working with this stack is essential.
This role is based in San Francisco CA with a hybrid work model of 3 days per week in the office. Relocation assistance is available.
Design manage and optimize WAN and LAN infrastructure for OpenAIs supercomputers.
Develop and maintain data collection and monitoring systems to ensure network visibility and performance.
Troubleshoot and resolve network issues such as TCP/IP BGP and physical.
Automate network issue detection and resolution to reduce operational overhead.
Work closely with hardware and systems engineers to meet the performance demands of distributed AI training workloads.
Have 5 years of experience in networking or related infrastructure roles.
Possess strong expertise in networking technologies protocols and design principles.
Have handson experience with troubleshooting complex networking issues including both LAN and WAN environments.
You deeply understand how to set up TCP/IP networks from scratch (e.g. BGP ECMP routing etc.
Deep understanding of network protocols such as TCP/IP BGP & VLAN.
Familiarity with optical connectors and optical circuit switches (OCS)
Understand advanced concepts in routing forwarding and network management systems.
Have experience with telemetry traffic engineering and congestion management to optimize network performance.
Are skilled in collaborating across teams combining technical expertise with excellent problemsolving and communication abilities.
Ownership of problems endtoend and maintain a commitment to continuous learning to effectively solve challenges
Are familiar with InfiniBand RoCE or RDMA in HPC (HighPerformance Computing) or similar environments.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that generalpurpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core and to achieve our mission we must encompass and value the many different perspectives voices and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race religion national origin gender sexual orientation age veteran status disability or any other legally protected status.
OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities and requests can be made via thislink.
OpenAI Global Applicant Privacy Policy
At OpenAI we believe artificial intelligence has the potential to help people solve immense global challenges and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Full-Time