Principal Infrastructure Engineer
Department:
Job Summary
Principal Software Development Engineer
OracleTokyo Japan (Hybrid)
We are looking for hands-on Principal Core Software Development Engineer with deep expertise in HPC GPU infrastructure and AI platform engineering to join our growing this role you will design and deploy large-scale accelerated computing solutions lead customer engagements and drive adoption of cutting-edge AI workloads on Oracle Cloud Infrastructure (OCI). This is an exceptional opportunity for someone with strong technical acumen customer focus and passion for cloud-native innovation.
As a Principal Core Software Development Engineer you will be at the forefront of designing and implementing next generation accelerated computing and AI solutions on Oracle Cloud Infrastructure (OCI). You will engage directly with startup to strategic customers helping them architect and deploy complex HPC and GPU clusters AI platforms and intelligent agentic solutions across POC and production environments. You will play a pivotal role in pre-sales technical consulting solution engineering and AI transformation strategy.
This is a highly visible and influential role combining deep technical skills with a consultative approach to support from emerging AI Startups to Fortune 500 customers develop scalable AI architectures and contribute to Oracles strategic vision for cloud and AI adoption.
Key Responsibilities
Architect and deploy large-scale GPU/HPC infrastructure on OCI using tools like Terraform Ansible Slurm and Kubernetes.
Build automated solutions for cluster provisioning software deployment and infrastructure as code.
Collaborate with Oracles largest enterprise customers to define and tailor solutions that meet high-performance compute and AI requirements.
Support LLM-based solutions agentic AI systems and robotic AI platforms from design through deployment.
Act as a trusted technical advisor guiding customers on best practices cloud migration strategies and deployment patterns.
Conduct customer training workshops and technical deep dives to enable successful cloud adoption.
Collaborate cross-functionally with product support and engineering to close technical gaps and influence product roadmaps.
Develop and share technical assets including competitive differentiators code samples demos blogs and white papers.
Identify and work with key AI Partners to support customer requirements from design to deployments.
Required Technical Skills
Hands-on expertise with GPU and HPC architecture in cloud and on-prem environments.
Proficiency in scripting and automation: Python Bash PowerShell Terraform Ansible.
Experience with cluster managers (SLURM PBS Bright) Kubernetes and container orchestration.
Knowledge of RDMA Infiniband MPI and distributed file systems.
Core Cloud Native experience
Familiarity with AI/ML platforms large language models (LLMs) and inference serving stacks.
Business & Leadership Skills
5 years in pre-sales technical consulting or customer-facing solution architecture.
Strong communication and presentation skills for both technical and executive audiences.
Passion for working with top-tier customers and partners to deliver innovative cloud solutions.
Ability to translate complex technical capabilities into business-aligned strategies.
Preferred Qualifications
Bachelors or Masters degree in Computer Science Engineering Mathematics or related field.
Demonstrated thought leadership through publications speaking engagements or community contributions.
Experience working with Oracle Cloud Infrastructure (OCI) or similar cloud platforms.
Responsibilities
The Principal Core Software Development Engineer is responsible for leading the design deployment and support of large-scale AI GPU and HPC infrastructure solutions on Oracle Cloud Infrastructure (OCI). The role partners closely with customers throughout the entire engagement lifecycle from solution architecture and Proof of Concept (POC) through production deployment optimization and ongoing operational support. As a trusted technical advisor the engineer provides guidance on cloud-native architectures Kubernetes Slurm AI platforms automation and best practices while working closely with Product Management Engineering Support Sales and partners to deliver successful customer addition the role contributes to Oracles technical leadership by developing reusable assets automation reference architectures and technical enablement content that accelerate customer adoption and strengthen Oracles position in AI and cloud infrastructure.
Qualifications
Career Level - IC4
Required Experience:
Staff IC
About Company
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when eve ... View more