Infrastructure Reliability Engineering, Senior Manager

Hong Kong Exchanges And Clearing

Job Location:

London - UK

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Infrastructure Reliability Engineering Senior Manager

Shift Pattern:

Standard 40 Hour Week (United Kingdom)

Scheduled Weekly Hours:

Corporate Grade:

C - Vice President

Reporting Line:

(UK Division) Information Technology

Location:

UK-London

Worker Type:

Permanent

About the London Metal Exchange and LME Clear:

The London Metal Exchange is the worldcentrefor industrialmetalstrading. Most of the worlds global non-ferrous futures business is conducted on the LMEs three trading platformstotalling$18 trillion 178 million lots and 4 billiontonneswith a market open interest high of 1.8 million lots in 2024. All trades areclearedand settled by LME Clear.

Participants can transfer or take on price risk againstaluminium copper nickel tin zinc lead molybdenum cobalt lithium steel scraprebarand hot-rolled coil as well as aluminaaluminiumpremiumsand alloys.

The LME and LME Clear are HKEX Group companies.

OverallPurpose of Role:

Thisrole is accountable forinitiallyestablishingthenmaturing abest of breedInfrastructure Reliability Engineering (IRE) function embedding reliability engineering as a core discipline across the technology lifecycle from design through live operation in support oftrading criticalandregulatory significantservices.

To provide senior leadership across Infrastructure Reliability Engineering accountable for the resilience availability and operational readiness of the LME Group technology estate. Lead the design and delivery of complex infrastructure transformation platform modernisation andre-architectureinitiatives ensuring secure compliant andhighly reliableservices that supporttrading criticaloperations and regulatory obligations.

Responsibilities:

Establish mature and continuously evolve the Infrastructure Reliability Engineering function defining the IRE operating model engagement patterns and service boundaries across infrastructure architecture operations security and application teams.

Setmaintain and enforce consistent reliability engineering standards patterns and tooling across the infrastructure estate balancing resilience regulatory assurance and operational efficiency.

Act as senior Infrastructure Reliability Engineering SME across major programmes endtoend (discovery dependency mapping design planning build cutover fallback) with direct accountability for service stability and risk reduction for tradingcritical platforms.

Drive a proactive reliability andfailure engineeringculture including structured risk identification resilience testing failover validation andscenario basedexercises fortrading criticaland systemically important services.

Act as the accountable owner for Infrastructure Operational Readiness ensuring platforms and services do not transition into live operation without meeting mandated readiness observability recoverability and supportability criteria.

Define and embed a consistent reliability measurement framework across infrastructure platforms includingservice levelindicatorsobjectives and leading indicators of operational risk enablingdata drivenprioritisation and informed investment decisions.

Build lead and develop ahigh performingInfrastructure Reliability Engineering team defining clear role expectations capability standards and development pathways.

Foster a culture of engineering excellence shared ownership and continuous improvement ensuring operational knowledge and resilience capability are institutionalised and not dependent on individuals.

Act as a senior authority on infrastructure resilience and operational risk influencing strategic decisions architectural direction and investment priorities to ensure reliability is designed in not retrofitted.

Own measurable infrastructure reliability outcomes including availability resilience recovery performance and operational risk reduction with regularexecutive levelreporting against agreed targets.

Own and enforce reliability governance including stage gates design authorities risk and issue management CAB/change control and auditable documentation aligned to ITSM IBS and regulatory expectations.

Lead platform modernisation and resilience engineering initiatives including containerisation and cloudadjacent platforms (e.g. Kubernetes OpenShift) working closely with Architecture InfoSec and application teams to embed reliability security and observability by design.

Define and drive the LME Infrastructure Reliability posture including fault tolerance redundancy capacity planning disaster recovery and failover strategies across onprem and hybrid environments.

Lead seniorlevel technical discovery and design workshops to shape scope delivery approach and resourcing for reliabilitycritical initiatives ensuring alignment with IOE priorities and business outcomes.

Establish and assure Operational Readiness (ORR) standards: runbooksmonitoringand alerting SLIs/SLOs performance and capacity baselines service transition and operational handover.

Ensure infrastructure platforms meet security and compliance requirements (e.g. CIS ISO 27001 NIST) covering identity and access management encryption auditability and regulatory evidence.

Engage at senior stakeholder level across Technology and the business providing clear communication on delivery status operational risk dependencies cost forecasts and resource demand.

Academic and Professional Qualifications Required:

Bachelors degree in Computer Science Engineering Information Technology or a closely related discipline.

Demonstrabletrack recordof continuous professional development in infrastructure solutions engineering or technology transformation.

Required Knowledge and Level of Experience:

10 years of experience leading largescale Infrastructure or Reliability Engineering functions with demonstrable accountability for the availability resilience and operational performance ofmissioncriticalsystems.

Proven experienceestablishing scaling or materially maturing an Infrastructure Reliability Platform Reliability or equivalent function within a complex regulated orhighavailabilityenvironment.

Significant experienceoperatingin regulated orhighassuranceenvironments (e.g. financial services exchanges clearing or equivalent).

Experience influencing senior leadership and steering complex transformation initiatives across multiple technology domains.

Significant experienceleading or assuring largescale enterprise Linux estates (e.g.RHELbased) including responsibility for reliability resilience and operational risk in regulated orhighavailabilityenvironments.

Skills set and Core Competencies Required for Role:

Deepexpertisein infrastructure reliability engineering resilience patterns and operational risk management

Strong governance assurance and regulatory mindset

Excellent stakeholder engagement and senior communication skills

Ability to lead multidisciplinary technical teams through complex change

Datadriven approach to reliability performance and continuous improvement

Reliability engineering resilience patterns and operational risk management.

Governance assurance and regulatory mindset.

Datadriven analysis and decisionmaking.

Senior stakeholder influence and technical authority.

Team leadership and capability development.

Technical Skills Infrastructure Reliability Engineering

Enterprise Linux / RHEL mastery

Linux reliability performance and capacity engineering

Automation standardised builds configuration management

Observability diagnostics and rootcause analysis

Linux host reliability for container / OpenShift platforms

Linux security hardening and compliance

Linuxlevel failure engineering and resilience patterns

Senior Linux technical authority

Personal Qualities:

High integrity ownership and accountability in all aspects of work.

Structured pragmatic and calm under pressure.Able to manage competing priorities and deliver in high-stakes environments.

Collaborative and inclusive building strong cross-functional relationships and fostering a culture of open communication.

Curious and improvement-oriented alwaysseekingto challenge the status quo and drive innovation with data-driven insights.

Adaptable and resilient able to navigate ambiguity and lead teams through complex change.

Commitment to diversity equity and inclusion respecting and valuing the unique contributions of all colleagues.

Comfortable holding the line on operational risk and readiness inhighpressuretimesensitivedelivery environments.

The LME is committed to creating a diverse environment and is proud to be an equal opportunity employer.In recruiting for our teams we welcome the unique contributions that you can bring in terms of education ethnicity race sex gender identity expression and reassignmentnation of origin age languages spoken colour religion disability sexualorientationand beliefs.In doing so we want every LME employee to feel our commitment to showing respect for all and encouraging open collaboration and communication.

Required Experience:

Senior Manager

Infrastructure Reliability Engineering Senior ManagerShift Pattern:Standard 40 Hour Week (United Kingdom)Scheduled Weekly Hours:40Corporate Grade:C - Vice PresidentReporting Line:(UK Division) Information TechnologyLocation:UK-LondonWorker Type:PermanentAbout the London Metal Exchange and LME Clear:...