Senior Optical DC Engineer

Oracle


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 7 days ago
Vacancies: 1 Vacancy

Job Summary

Description

AI2CNE strives to be a global leader in the RDMA cluster networking domain and enable seamless accelerated High-Performance Compute (HPC) Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries reshape societies and unlock limitless possibilities. Our vision is to be a pioneering force driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI ML HPC workloads.

We strive to be the go-to experts in RDMA cluster network architecture leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements we aim to redefine the boundaries of what is possible pushing the envelope of computational capabilities and unlocking unprecedented performance.

This role supports design deployment and operations of large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of high-speed fiber optic network fabric links and systems through a combination of a deep level understanding of optical cables of various types (patch cords shuffle bulk/trunk etc.) and high speed optical transceivers for interconnects for leaf-spine RDMA cluster networks at the L0/L1 physical layer1 and L2 protocol level coupled with troubleshooting and automation/programming skills. As OCI is a cloud-based network with a global footprint this support will include millions of optical links for hundreds of thousands of network devices supporting millions of servers connected over a mix of dedicated backbone infrastructure CLOS Network and the Internet.



Responsibilities

Collaborate with engineers from L1 optical engineering team network design delivery and AI Ops DC Ops and DC build teams and program/project managers to develop milestones and deliverables validating optical cabling and optical transceivers build quality and validation in the AI data center builds to the OCI standards for RDMA backend networks.

  • Will primarily use existing procedures and tools to develop and safely execute DC network builds and changes. However may have to develop new procedures from time to time.
  • Provide break-fix support for optical links to meet RDMA cluster performance criteria (pre-FEC BER Rx power FEC bin BOL and EOL margins etc.).
  • Serve as the escalation point for event remediation and lead post-event root cause analysis.
  • Frequently develops MPOs or scripts to automate routine tasks for team and business units to improve quality of builds.
  • Support dashboards build with requirements to represent data at L1 layers and device roles that help identify link level issues anomalies such as link flaps and link downs.
  • Serves as SME on data center build standards for DC build environment optical cabling and optics transceivers install and troubleshooting.
  • Participate in AI DC deployment rotations at DC build sites with up to 50% international/domestic travel for build sites quality control and optical link validation support for new clusters and provide recommendations to various teams for improvementand enforcement
  • Support Ops to stabilize RDMA networks after turn-up.


Qualifications

Career Level - IC4




Required Experience:

Senior IC

DescriptionAI2CNE strives to be a global leader in the RDMA cluster networking domain and enable seamless accelerated High-Performance Compute (HPC) Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize indust...

About Company

Company Logo

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more

View Profile View Profile