Principal HPC Architect
Milpitas, CA - USA
Job Summary
Company Overview
KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop smartphone wearable device voice-controlled gadget flexible screen VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles integrated circuits packaging printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists engineers data scientists and problem-solvers work together with the worlds leading technology providers to accelerate the delivery of tomorrows electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.Group/Division
The Information Technology (IT) group at KLA is involved in every aspect of the global business. ITs mission is to enable business growth and productivity by connecting people process and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service creativity and technological excellence enables employee productivity business analytics and process excellence.Job Description/Preferred Qualifications
The Principal HPC Architect designs builds optimizes and supports large scale compute environments used for scientific computing AI/ML workloads simulation and data intensive research. This role blends systems engineering performance tuning cluster architecture and hands on troubleshooting. The engineer partners with researchers developers and IT teams to deliver reliable scalable and high performance compute infrastructure.
Key Responsibilities:
HPC Architecture & Engineering
Design and implement HPC clusters including compute storage networking and jobscheduling components.
Evaluate and integrate new technologies (GPUs accelerators interconnects filesystems).
Develop automation for cluster provisioning configuration and lifecycle management.
Architect solutions for largescale parallel workloads AI/ML pipelines and dataintensive applications.
Performance Optimization:
Profile and tune applications for CPU GPU memory and I/O performance.
Optimize MPI OpenMP CUDA and other parallel programming frameworks.
Benchmark hardware and software stacks to guide procurement and architecture decisions.
Operations & Reliability:
Maintain and monitor HPC clusters job schedulers (Slurm PBS LSF) and distributed filesystems (Lustre GPFS BeeGFS).
Troubleshoot complex system issues across compute storage and network layers.
Implement security best practices patching and compliance controls.
Ensure high availability and efficient resource utilization.
Automation & DevOps:
Build and maintain CI/CD pipelines for HPCrelated software and infrastructure.
Use tools such as Ansible Terraform Kubernetes or custom scripts to automate workflows.
Develop monitoring and observability solutions (Prometheus Grafana ELK etc.).
Collaboration & Leadership:
Work closely with researchers data scientists and engineering teams to support workload optimization.
Provide technical leadership mentorship and guidance to junior engineers.
Document architectures procedures and best practices.
Participate in capacity planning and longterm HPC strategy.
Required Qualifications:
Extensive experience with Linux systems engineering in largescale compute environments.
Solid understanding of distributed systems and cloud infrastructure
Deep knowledge of HPC schedulers (Slurm preferred) MPI stacks and parallel computing models.
Strong understanding of highspeed interconnects (InfiniBand RoCE) and distributed storage systems.
Proficiency in scripting languages (Python Go Bash) and automation frameworks.
Experience with GPUs (NVIDIA CUDA MIG NVLink) and acceleratorbased computing.
Familiarity with containerization (Singularity/Apptainer Docker) in HPC contexts.
Strong troubleshooting skills across hardware OS and application layers.
Understanding of networking fundamentals (TCP/IP DNS load balancing)
Background in high-availability and distributed systems at scale
Soft Skills:
Excellent communication and crossfunctional collaboration.
Ability to translate research needs into technical solutions.
Strong ownership mindset and ability to lead complex initiatives.
Minimum Qualifications
Doctorate (Academic) Degree and related work experience of 8 years; Masters Level Degree and related work experience of 12 years; Bachelors Level Degree and related work experience of 15 years
Base Pay Range: $162700.00 - $284700.00 AnnuallyPrimary Location: USA-CA-Milpitas-KLAKLAs total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical dental vision life and other voluntary benefits 401(K) including company matching employee stock purchase program (ESPP) student debt assistance tuition reimbursement program development and career growth opportunities and programs financial planning benefits wellness benefits including an employee assistance program (EAP) paid time off and paid company holidays and family care and bonding leave.Interns are eligible for some of the benefits listed. Our pay ranges are determined by role level and location. The range displayed reflects the pay for this position in the primary location identified in this posting. Actual pay depends on several factors including state minimum pay wage rates location job-related skills experience and relevant education level or training. We are committed to complying with all applicable federal and state minimum wage requirements where applicable. If applicable your recruiter can share more about the specific pay range for your preferred location during the hiring process.
KLA is proud to be an Equal Opportunity Employer. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process to perform essential job functions and to receive other benefits and privileges of employment. Please contact us at or at 1- to request accommodation.
Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview to become an employee or for equipment. Further KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLAs Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication an interview an offer of employment or that an employee is not legitimate please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.
Required Experience:
Staff IC
About Company
Calling the adventurers ready to join a company that's pushing the limits of nanotechnology to keep the digital revolution rolling. At KLA, we're making technology advancements that are bigger—and tinier—than the world has ever seen. Who are we? We research, develop, and manufacture t ... View more