Position: Senior Engineering Technician
Duration: Contract (12 months)
Location: Santa Clara CA (100% onsite travel to data centers in Santa Clara and Sunnyvale).
Job Description:
Looking for a motivated Engineering Technician for Clients on-premise private cloud infrastructure! In this role you will be faced with the challenge of providing and maintaining a compute farm of systems which includes Builders Packagers and Testers that act as a test-bed for our developers worldwide to test various hardware and software prior to release. The environment is huge the scale massive and the ask enormous!
What Youll Do:
- Collaborate closely with engineering teams (system architects hardware/software engineers QA and more) to design develop debug and release next-generation products.
- Manage and maintain a high-performing Compute Farm of builders packagers testers and core infrastructure.
- Ensure availability targets are consistently met and lead system recovery efforts.
- Deploy and qualify systems while supporting exciting new technology bring-ups.
- Oversee inventory and lifecycle management for Clients assets across data centers and labs.
- Gather critical metrics and create Standard Operating Procedures (SOPs) documentation.
- Maintain a world-class safe and well-organized environment in our data centers and labs.
- Troubleshoot Linux/Windows hardware and infrastructure issues alongside engineers and platform operations teams.
- Plan deploy and maintain on-premises private cloud infrastructure collaborating with datacenter and network engineering teams.
- Implement efficiency improvements to maximize availability throughput and test accuracy while meeting SLAs and KPIs.
- Represent the team in meetings with internal stakeholders and contribute to global operations.
What We Need to See:
- Associates or Bachelors Degree in Engineering/Technical Major (or equivalent experience).
- 5 years of experience in data centers or large engineering labs.
- Familiarity with SCMs like GIT/Perforce.
- Proficiency in DCIM (Nautobot etc.) and scripting (shell Python Ansible).
- Working knowledge of protocols/services like TCP/IP DNS NFS SSL etc.
- Experience with Windows Linux and Mac operating systems.
- Hands-on experience with PCBs GPUs and system deployments.
- Exceptional communication skills both written and verbal.
- Ability to explain technical concepts to non-technical audiences.
- Strong problem-solving skills and a collaborative spirit.
What Makes You Stand Out:
- Experience managing HPC clusters using tools like BCM and Slurm.
- Hands-on knowledge of OpenStack.
- Relevant certifications such as CCNA or equivalent.
- Strong background in Windows and Linux administration with an understanding of dense datacenter design including compute storage and networking.
- Experience with hypervisors and VM applications.
- Knowledge of DC infrastructure with an emphasis on liquid cooling.
- A track record of technical curiosity and innovation.
- Mechanically inclined and comfortable with tools and physical tasks.
- Energetic enthusiastic and the understanding of what it takes to get the team to the finish line.
- Willing to go the extra mile to get the job done!
- This is an onsite contract position and will require local travel to DCs within Santa Clara.
Qualifications/Key Responsibilities:
- 5 years of experience working in a data center/lab environment
- Associates or Bachelors Degree in Engineering/Technical Major (manager prefers bachelors degree)
- Scripting/automation expertise
- Team focuses on early product development
- Strong coordination skills in the R&D space
- Experience managing HPC clusters (Slurm)
- 2-3 years of script building (Linux based)
- Working experience with process development and driving tasks to completion
- Scrum Agile
Software:
- Linux Python
- Jenkins
- Scripting (Bash Ansible)
- DCIM Tools (Nautobot)