Data Centre Operations Manager
Job Summary
Be a part of something BIG!
As an Operations Manager for Singtels GPU-as-a-Service (GPUaaS) platform you will be responsible for overseeing the day-to-day operational management of high-density data centre facility infrastructure.
This role leads the operational team responsible for data centre facility operations and infrastructure readiness vendor coordination and operational governance to ensure the reliability and availability of Singtels GPUaaS platform.
You will work closely with GPU operation team managing GPU clusters and AI infrastructure while ensuring the data centre environment supports the demanding power cooling and operational requirements of GPU-based AI workloads.
Responsibilities:
- Lead and manage the daily DC operations of Singtels GPUaaS data centre facility supporting GPU cluster infrastructure.
- Manage and supervise the data centre operations team responsible for facility and infrastructure operations.
- Ensure 247 operational readiness of the GPUaaS platform environment.
- Define and implement operational procedures SOPs MOPs and emergency response procedures (ERP) for mission-critical DC facility and infrastructure operations.
- Ensure DC incidents are responded to triaged escalated and coordinated appropriately based on criticality and SLA requirements including conducting root cause analysis for operational events impacting infrastructure availability
- Oversee operations of data centre facility infrastructure including power cooling environmental monitoring and physical infrastructure.
- Manage operations related to:
- Electrical infrastructure (UPS PDUs generators)
- Cooling systems including air cooling and direct liquid cooling (DLC) systems
- Environmental monitoring systems
- Leak detection systems
- Building Management Systems (BMS)
- Ensure the data hall environment supports high-density GPU server deployments with appropriate power and thermal management.
- Plan and coordinate maintenance activities shutdowns and infrastructure upgrades with internal teams and vendors.
- Monitor facility capacity including power cooling and rack space utilization for GPU infrastructure growth.
- Oversee physical infrastructure operations including rack installation hardware deployment and cabling management and from a data centre operations perspective coordinate the deployment operational stabilization maintenance and lifecycle management of GPU infrastructure components (GPU servers storage systems and networking equipment).
- Ensure proper asset management rack documentation and infrastructure inventory tracking.
- Support platform engineering and operation teams during hardware troubleshooting and infrastructure maintenance activities
- Manage relationships with data centre facility vendors contractors and service providers.
- Ensure vendors adhere to WSH regulations security policies and operational procedures while working in the GPUaaS data centre environment.
- Coordinate vendor access and ensure proper supervision of all vendor activities within the data hall.
- Monitor vendor performance against service level agreements (SLAs) and operational commitments.
- Oversee monitoring of data centre infrastructure and facility systems including power cooling and environmental sensors
- Prepare and review operational reports on data centre health incidents and infrastructure performance
- Maintain operational documentation including: Runbooks Maintenance procedures Operational logs Infrastructure documentation.
- Ensure compliance withdata centre physical security policies and operational governance requirements
- Manage physical access control to the data hall and critical infrastructure areas.
- Ensure operations align with internal governance frameworks and industry best practices including standards such as ISO 27001 where applicable.
- Support audit and compliance activities relating to data centre operations and facility management.
Requirements
- Minimum of 8 years in data centre operations and management with at least 3 years in a leadership/managerial position.
- Proven experience managing data centre operational teams and vendor coordination
- Strong knowledge and experience in data centre facility including physical and DC security. Bonus for knowledge and experience in high-density DC power environment and liquid cooling.
- Well versed in various equipment maintenance and upkeep including electrical and mechanical.
- Experience in leadership/managerial roles with excellent team management skills.
- Organized and adaptive to changes in work schedules and arrangements.
- Strong interpersonal and professional communications skills as well as presentation skills.
- Proficiency in managing customer interactions and improving service delivery to enhance customer experience
- Experience supporting high-density compute environments or hyperscale infrastructure is highly desirable
Rewards that Go Beyond
Full suite of health and wellness benefits
Ongoing training and development programs
Internal mobility opportunities
Your Career Growth Starts Here. Apply Now!
Required Experience:
Manager
About Company
The Singtel Group, Asia's leading communications group provides a diverse range of services including fixed, mobile, data, internet, TV, infocomms technology (ICT) and digital solutions.