High-Performance Computing (HPC) Center Operations Manager

LLNL

Not Interested
Bookmark
Report This Job

profile Job Location:

Livermore, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 5 hours ago
Vacancies: 1 Vacancy

Job Summary

We have an opening for a High-Performance Computing (HPC) Center Operations Manager to lead a team of 15 technicians providing 24x7 support for HPC systems and facilities. You will oversee advanced monitoring and diagnostics ensure secure and reliable operations and implement innovative technical solutions in facilities (mechanical electrical cooling) and systems maintenance and operations (monitoring decommissioning hardware repairs software updates). As the Operations Manager you will provide technical leadership to engineers and technicians drive continuous improvement in service delivery and facilitate interactions between the operations and facilities teams. You will collaborate with senior management on planning budgeting and organizational initiatives within the Livermore Computing (LC) program. This position includes advanced technical work and leading a team of engineers or computer scientists in technical projects related to HPC facilities or systems engineering (e.g. designing validating and optimizing large-scale cooling infrastructure; troubleshooting HPC facilities and systems during emergency situations; collaborating with engineers and external partners to improve facility operations.)

As a Group Leader in Livermore Computing Division within the Computing Directorate you will manage staff recruiting and development and work with the other LC Division managers to ensure consistent administrative practices and training.

This position will be filled at the SEL.3 or SEL.4 level

You will

  • Provide expert technical leadership to team members including recruiting hiring mentoring conducting performance appraisals facilitating quarterly feedback sessions and one-on-one meetings and managing salary and career development to support staff growth and operational excellence.
  • Oversee 24x7 support of HPC systems and networks while utilizing advanced monitoring and diagnostic tools to ensure reliability and rapid incident response for systems and supporting infrastructure. Provide guidance and support to shift supervisors during operational events and ensure effective incident response resolution and reporting.
  • Establish implement and continuously improve procedures schedules and work priorities for HPC operations identifying and developing key growth areas for staff and processes.
  • Lead the development and deployment of innovative tools and processes to enhance operational efficiency and technical service delivery for HPC facilities and operations.
  • Manage multiple vault type rooms oversee siting and infrastructure projects and ensure strict compliance with safety and security policies and requirements.
  • Develop formal training plans to enhance team skills in alarm response safety practices HPC system monitoring troubleshooting repair and issue escalation for operations and facilities teams.
  • Collaborate with senior management in planning budgeting and decision-making; and represent the organization in vendor meetings cross-divisional initiatives and external organizations such as Energy Efficiency High Performance Computing Working Group HPC operational reviews or other professional best-practice groups.
  • Keep pace with the escalating demands of next-generation platforms by providing solutions for highly unusual and complex HPC engineering challenges that arise from the intersection of extreme power density precision cooling demands evolving HPC compute loads and mission-critical uptime requirements.
  • Perform other duties as assigned.

Qualifications :

  • This position requires an active Department of Energy (DOE) Q-level clearance or active Top-Secret clearance issued by another U.S. government agency at the time of hire.
  • Bachelors degree in engineering computer science or related field or equivalent combination of education and experience in HPC Facilities and Operations.
  • Significant experience managing and troubleshooting  HPC environments including monitoring and maintenance of systems (e.g. computers storage) and facilities (e.g. mechanical electrical cooling systems).
  • Advanced technical experience installing and operating HPC equipment networks or associated facilities and resolving issues in cooperation with vendors and staff.
  • Significant experience in recruiting and supervising technical staff preparing performance reviews and participating in performance management processes.
  • Advanced communication facilitation and collaboration skills to lead a group explain policies and interact with management technical teams and vendors.
  • Significant experience developing written processes and/or procedures to improve service delivery and operational efficiency and experience training technicians and engineers and assessing skills.
  • Advanced knowledge of data center infrastructure and equipment.

Additional qualifications at the SEL.4 level

  • Substantial management technical administrative and leadership skills that enable building positive trust-based relationships executing performance management and directing and monitoring the work of a diverse range of technicians and engineers effectively.
  • Expert knowledge of strategic planning advanced problem solving decision making and analytical skills necessary to independently anticipate analyze advise recommend approve appropriate actions and implement solutions to highly complex issues having significant impact
  • Highly advanced experience managing complex computer installation projects independently developing and designing plans and preparing written specifications and drawings for HPC projects. 

Qualifications We Desire

  • Extensive experience working in a High-Performance Computing Center and responding to emergency situations to diagnose and fix significant issues with computers or mechanical equipment while under pressure.
  • Experience in payroll supervision organizational performance alignment salary management and knowledge of DOE/NNSA/LLNL policies and procedures.
  • Experience with HVAC electrical and structural systems in a data center environment.

Pay Range

$18966 - $240528 annually for SEL.3

$227430 - $288396 annually for SEL.4


Additional Information :

Position Information

This is a Career Indefinite position. Lab employees may be considered for this position.

Security Clearance

This position requires an active Department of Energy (DOE) Q-level clearance or active Top Secret clearance issued by another U.S. government agency at time of hire. 

Wireless and Medical Devices

Per the Department of Energy (DOE) Lawrence Livermore National Laboratory must meet certain restrictions with the use and/or possession of mobile devices in Limited Areas. Depending on your job duties you may be required to work in a Limited Area where you are not permitted to have a personal and/or laboratory mobile device in your possession.  This includes but not limited to cell phones tablets fitness devices wireless headphones and other Bluetooth/wireless enabled devices.  

If you use a medical device which pairs with a mobile device you must still follow the rules concerning the mobile device in individual sections within Limited Areas.  Sensitive Compartmented Information Facilities require separate approval. Hearing aids without wireless capabilities or wireless that has been disabled are allowed in Limited Areas Secure Space and Transit/Buffer Space within buildings.

Equal Employment Opportunity

We are an equal opportunity employer that is committed to providing all with a work environment free of discrimination and harassment. All qualified applicants will receive consideration for employment without regard to race color religion marital status national origin ancestry sex sexual orientation gender identity disability medical condition pregnancy protected veteran status age citizenship or any other characteristic protected by applicable laws.

Reasonable Accommodation

Our goal is to create an accessible and inclusive experience for all candidates applying and interviewing at the Laboratory.  If you need a reasonable accommodation during the application or the recruiting process please use our online form to submit a request. 

California Privacy Notice

The California Consumer Privacy Act (CCPA) grants privacy rights to all California residents. The law also entitles job applicants employees and non-employee workers to be notified of what personal information LLNL collects and for what purpose. The Employee Privacy Notice can be accessed here.


Remote Work :

No


Employment Type :

Full-time

We have an opening for a High-Performance Computing (HPC) Center Operations Manager to lead a team of 15 technicians providing 24x7 support for HPC systems and facilities. You will oversee advanced monitoring and diagnostics ensure secure and reliable operations and implement innovative technical so...
View more view more

Key Skills

  • Arabic Speaking
  • Access Control System
  • B2C
  • Account Management
  • Legal Operations
  • Broadcast

About Company

Join us and make YOUR mark on the World!Are you interested in joining some of the brightest talent in the world to strengthen the United States’ security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG idea ... View more

View Profile View Profile