drjobs HPC Systems Engineer 117333

HPC Systems Engineer 117333

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Homewood - USA

Yearly Salary drjobs

$ 73300 - 128300

Vacancy

1 Vacancy

Job Description

The Advanced Research Computing at Hopkins (ARCH) group is seeking a highly qualified and motivated HPC Systems Engineer to join the systems team. This system (ROCKFISH) with over 45000 cores and several petabytes of storage serves the HPC and data intensive science needs of researchers at Johns Hopkins University. The Systems Engineer contributes to the strategic planning design testing organization and implementation of cuttingedge technology projects for the facility. The systems team is responsible for the daytoday administration of HPC clusters High Performance storage systems backups networking security and any other services related to the operation of a large HPC center. The successful candidate will have experience in similar roles in high performance computing (HPC) labs or university settings.


Specific Duties & Responsibilities


70 Systems Engineering Administration Security and Oversight

  • Work with Sr staff to design organize plan test and implement cuttingedge hardware designs for an HPC environment.
  • Extensively document systems processes so that users can easily find useful information and other IT staff can perform routine tasks and provide backup.
  • Provides stable solutions for HPC resources.
  • Maintain job scheduling and storage allocation systems and policies to ensure fair allocation of shared resources.
  • Maintain extensive monitoring systems to facilitate quick proactive responses to routine failures and to provide comprehensive performance data logging.
  • Provide general system administration backup and escalation for other staff.
  • Assist with facilitiesrelated issues that directly affect MARCC
  • Ensure resources meet the communitys needs and are highly available to the group with limited interruption.
  • Manage inventory of resources in coordination with respective vendors.
  • Automate user account creation management and purging.
  • Contribute to planning sessions on network and security issues for MARCC. Work closely with the central networking group.
  • Implement network configuration and security measures to assure effective utilization of resources.
  • Understand HPC technical needs. Work closely with the facilitys director and oversight groups to successfully implement policies and procedures.
  • Create and maintain a stable secure operating system and software environment which continues to meet users evolving research needs.
  • Implement and maintain secure measures to protect data subject to restrictions.
  • Manage data access restrictions on a per user and group basis.
  • Implement and maintain monitoring measures for data and system access.
  • Other Systems Tasks as assigned by supervisor.


20 Technological Research

  • Offer technical advice on new projects that directly involve HPC computing at Hopkins.
  • Develop custom tools where necessary and contribute useful creations back to opensource development efforts where appropriate.
  • Implement and test new technologies that could be beneficial to HPC.


10 Training/Education

  • Continuously evaluate new tools and technologies for use in existing and future clusters.
  • Attending department and Universitysponsored training to increase knowledge improve skills and learn new skills. May substitute University training for supervisor approved commercial jobrelated course offerings.


Special Knowledge Skills & Abilities:

  • Proven experience deploying largecomplex scale projects.
  • Proven experience across multiple technologies with background in applications databases middleware etc.
  • Indepth knowledge of the design and organization of cuttingedge technology in HPC environments.
  • Indepth understanding of HPC Cluster hardware and management software.
  • Understanding of massive high performance parallel storage and methodologies.
  • Expert knowledge of Unix/Linux systems administration including all aspects of management monitoring performance analysis and integration in potentially complex heterogeneous environments.
  • Knowledge of networking high speed interconnects and network security principles in an HPC environment.
  • Use of configuration management tools (e.g. Bright xCAT puppet IPMI ROCKS) to help maintain largescale Linux clusters supercomputers storage systems and smaller systems.
  • The ability to interact with peer institutions to support HPC directives effectively furthering the goals of the MARCC facility.
  • Understand implement troubleshoot and support job scheduling resource management and workload management systems including diagnosis of failed jobs implementation of policies and investigations of new features and services.
  • Understand and support hierarchical file system infrastructure software and services including high performance parallel storage backup systems and robotic tape libraries.
  • Develop reports and customize tools that automate the monitoring process of critical systems and alert team of issues automatically.
  • Evaluate implement and manage appropriate high level complex software and hardware solutions by using best practices for the environment to ensure system integrity.
  • Install and configure infrastructure applications by following the industrys best practices to deliver effective solutions.
  • Maintain an effective schedule for systems backups and archive operations for mission critical systems.
  • Audit and maintain user access authorization and authentication.
  • Generate periodic reports on resource utilization.
  • Maintain resource inventory using best practice applications.
  • Advanced knowledge of Linux Apache SQL PHP/Python/Perl (LAMP) technology/toolkits.
  • Ability to handle high priority escalations whenever necessary
  • Ability to multitask while managing time and priorities
  • Troubleshoot and solve difficult system issues as they arise.
  • Must be adaptable and able to meet conflicting deadlines.
  • Exceptional organizational skills.
  • Maintain effective and thorough documentation of all configuration and tasks performed.
  • Ability to automate systems administration tasks wherever possible.
  • Excellent oral and written interpersonal skills.
  • Ability to meet the physical requirements of the position.
  • Keep up to date on emerging technologies.
  • Research recommend and implement new technologies based on their value to the research facility.
  • Ability to maintain confidentiality.
  • Excellent customer service skills.
  • Excellent communication skills
  • Must demonstrate strong critical thinking and analytical reasoning.


Internal and External Contacts

  • This position will interact with an array of departmental and central administrative offices faculty staff researchers and students and with numerous external constituents (i.e. other college administrators and faculty private businesses industry partners officials of federal and local agencies and research foundations) for the purpose of accomplishing HPC technology goals.
  • This includes providing instruction on protocol regulations and guidelines pertinent to the agency and/or University.
  • Works routinely with JHU and UMCP faculty administrators students and researchers.
  • Collaborates regularly with professional colleagues from the central organization and from other academic departments.
  • Collaborates regularly with colleagues in industry and at other peer institutions.


Minimum Qualifications
  • Bachelors Degree.
  • Five years related experience.
  • Additional education may substitute for required experience and additional related experience may substitute for required education to the extent permitted by the JHU equivalency formula.


Preferred Qualifications
  • Seven 7 years experience managing Linux servers with direct experience managing HPC clusters.
  • Experience as a highlevel Linux system administrator.
  • Experience managing mission critical services.
  • Familiarity with configuration of the HPC software stack including MPI OpenMP Intel and GNU compilers Math libraries.
  • Experience with opensource software compilation.
  • Indepth knowledge of TCP/IP networking and related protocols InfiniBand etc.
  • Experience with scientific application management packages like pymodules modules.
  • Excellent scripting skills python perl shell.
  • Programming skills in C C or scientific language desired but not required
  • Experience with MySQL or Mariadb database programming desired but not required.
  • Expert level knowledge of configuration management and monitoring tools (puppet nagios etc).
  • Experience configuring resource manager applications (like SLURM).
  • Experience with Apache administration.
  • Knowledge of scientific software applications in academic supercomputing environments.
  • Familiarity or experience with data subject to restrictions desired but not required.

Classified Title: Systems Engineer
Job Posting Title (Working Title):HPC Systems Engineer
Role/Level/Range: ATP/04/PE
Starting Salary Range: $73300 $128300 Annually (Commensurate w/exp.
Employee group: Full Time
Schedule: 37.5 hrs/wk MF
FLSA Status:Exempt
Location:Hybrid/Homewood Campus
Department name: Whiting School of Engineering
Personnel area: Whiting School of Engineering

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.