Principal GPUCPU Systems Engineer

Oracle

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 14 hours ago
Vacancies: 1 Vacancy

Job Summary

Description

Required Qualifications

  • 10 or more years of experience in hardware design system engineering and platform bring-up.
  • Hands-on experience with market-leading GPUs or AI platforms spanning development bring-up test and characterization.
  • Strong knowledge of AI/GPU and or AI/CPU platform architectures and capabilities.
  • Experience evaluating system architectures platform definitions and implementation paths.
  • Ability to balance hardware performance power cost regulatory and cross-functional requirements.
  • Experience with modern server platforms across x86 and ARM architectures.
  • Hardware development experience at the system board and FPGA levels.
  • Proficiency reviewing hierarchical schematics advanced multilayer board layouts and end-to-end interconnects.
  • Strong understanding of firmware and system diagnostics using BMC firmware UEFI or BIOS and Linux tools.
  • Experience scripting and customizing diagnostics validation and test workflows.
  • Experience with GPU supplier test code and open-source AI test and characterization tools.
  • Experience with system integration validation and performance characterization.
  • Strong understanding of high-speed buses and interconnects used in modern AI and compute platforms.
  • Demonstrated ability to debug and root-cause complex hardware and software issues.
  • Ability to document design intent and technical specifications clearly.
  • Strong communication skills with the ability to explain complex technical topics across engineering teams and executive audiences.
  • Proven ability to provide cross-functional technical leadership and collaborate effectively with internal teams and external partners.

Preferred Skills

  • Experience using hardware debuggers.
  • Experience with PCIe DDR Ethernet USB SPI and related interfaces.
  • Experience with platform-level security technologies.
  • Experience with power circuit design and signal integrity.


Responsibilities

Platform Architecture and Definition

  • Participate in platform definition architecture evaluation and analysis for existing and next-generation Cloud AI platforms.
  • Evaluate system architectures proposed implementations and scaling and optimization strategies.
  • Review and assess third-party merchant silicon used for AI accelerator modules and GPU/CPU platforms.
  • Balance hardware performance priorities against power cost regulatory and cross-functional requirements.

Platform Development and Oversight

  • Drive definition development integration debug characterization and tuning of AI hardware platforms.
  • Provide platform development oversight for internal teams and third-party partners.
  • Work with in-house engineering experts on design reviews schematics board layout and implementation decisions.
  • Document and specify design intent and technical details in collaboration with engineering teams.

System Integration Validation and Performance

  • Guide and support system integration system test qualification and characterization.
  • Define and oversee system validation plans diagnostics features and test strategies.
  • Develop and expand system characterization and performance testing capabilities.
  • Utilize supplier-provided and approved open-source AI platform qualification and test tools.
  • Support definition of in-service system monitoring error reporting and operational health visibility.

Cross-Functional and Partner Collaboration

  • Collaborate with GPU and AI chip suppliers system architects firmware developers and hardware engineers.
  • Partner with storage networking compute quality security cloud orchestration and manufacturing teams.
  • Support development program managers with technical assessments and planning.
  • Assist manufacturing teams to ensure hardware is secure robustly evaluated and production-ready.

Security Support and Operations

  • Participate in hardware platform security evaluations.
  • Guide internal teams and partners on scaling monitoring and deploying AI platforms into the cloud.
  • Serve as a senior technical advisor to Oracle hardware software cloud and support teams.
  • Act as the final level of engineering support for complex deployed product issues.
  • Assist with root-cause analysis through lab replication remote debug and cross-team collaboration.


Qualifications

Career Level - IC5




Required Experience:

Staff IC

DescriptionRequired Qualifications10 or more years of experience in hardware design system engineering and platform bring-up.Hands-on experience with market-leading GPUs or AI platforms spanning development bring-up test and characterization.Strong knowledge of AI/GPU and or AI/CPU platform architec...
View more view more

Key Skills

  • Engineering Support
  • Environment
  • Internship
  • Law Enforcement
  • Catering Operations

About Company

Company Logo

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more

View Profile View Profile