drjobs HPC System Site Lead

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Los Alamos, NM - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Job Responsibilities:

  • Maintain the HPC systems availability to the customer.
  • Lead technical output of onsite client HW technicians system admins and system analysts.
  • Serve as primary customer focal point for system support of systems and onsite activities.
  • Fulltime 100% presence on customer site for standard business hours.
  • Routine facetoface and group interaction with site team to organize tasks follow up and assist with challenges they encounter.
  • Track system health and Cases review regularly (weekly) with customers and HPC leadership.
  • Maintaining availability reports for tracking SLAs.
  • Preplan system upgrades; review plans with team and customers arrange for staffing and equipment including prearrange open lines of communication in case of issues.
  • Escalate Cases and assist team members escalating Cases to nexttier support and follow up to drive closure via escalation processes.
  • Manage onsite parts inventory using business tools.
  • Manage site tools and equipment.
  • Maintaining the oncall schedule to support our 365 24x7 contracts.
  • Assisting with hardware and system installation activities in new systems.

Team Support

  • Build strong working relationships with teammates leadership and customers.
  • Maintain awareness of upcoming training and prompt team members to complete trainings.
  • Maintain a team calendar of planned leave including oncall schedule for operational issues.
  • Provide performance review input to the District Service Manager (DSM) and suggestions for team member performance and development.
  • Escalate to DSM any personnel issues risk of missing SLA or customer satisfaction concerns.
  • Maintain a clean and safe working environment.
  • Support DSM in onboarding new team members by providing sitespecific details (e.g. customer network accounts badge parking etc.).

Required Qualifications & Experience:

  • 8 years of professional experience and a Bachelor of Arts/Science or equivalent degree in computer science or related area of study; without a degree three additional years of relevant professional experience (11 years in total).
  • Indepth knowledge of highperformance computing (HPC) systems.
  • Proficiency in managing and optimizing HPC environments including system configuration performance tuning and troubleshooting.
  • Strong understanding of parallel computing cluster management and distributed computing technologies.
  • Experience with HPC workload managers and schedulers such as SLURM PBS or similar.
  • Advanced knowledge of Linux operating systems.
  • Familiarity with software development tools and environments commonly used in HPC including compilers debuggers and performance analysis tools.
  • Experience with various scripting languages such as Python or Bash.
  • Proven experience in system administration including hardware and software installation maintenance and upgrades.
  • Knowledge of network architecture storage solutions and data management within HPC environments.
  • Ability to implement and manage security protocols and best practices in a highperformance computing context to maintain customer security posture.
  • Strong project management skills including planning execution and monitoring of HPC projects.
  • Ability to lead and coordinate a team of technical professionals ensuring timely and successful project delivery.
  • Experience in resource allocation budgeting and performance metrics tracking for HPC projects.
  • Excellent problemsolving abilities with a focus on identifying root causes and implementing effective solutions.
  • Strong analytical skills to assess system performance and make datadriven decisions for optimization.
  • Ability to troubleshoot complex technical issues in a highstakes HPC environment.
  • Exceptional communication skills both written and verbal to effectively interact with team members stakeholders and clients.
  • Ability to convey complex technical information in a clear and concise manner to nontechnical audiences.
  • Strong collaboration skills to work effectively within a multidisciplinary team and across organizational boundaries.
  • Extensive experience in HPC system management and administration with a track record of successful project and team leadership.
  • Willingness to participate in ongoing professional development and training opportunities which may require travel.

Preferred Qualifications:

  • CompTIA A or Server Certification
  • Security Certification
  • Linux Certification
  • PMP or Project
  • Vendor Certifications
  • Experience with tickettracking software (Salesforce SmartSheets; any ticket tracking is good)

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.