Lead Platform Reliability Engineer, Global AI Platform & Solutions
Job Summary
The Lead Platform Reliability Engineer (PRE) ensures the stability performance and scalability of the shared platform that supports internal AI solution development. It combines software engineering SRE practices and operations to keep the platform reliable and developer-friendly.
Position Responsibilities:
- Reliability and performance: Define SLOs/SLIs track operations budgets reduce MTTR capacity plan and tune autoscaling.
- Observability: Build and maintain logging metrics tracing and alerting; instrument platform components; create runbooks and dashboards.
- Incident response: On-call for platform incidents; triage mitigate root-cause and drive postmortems and corrective actions.
- Automation and tooling: Develop self-service capabilities AIOps/MLOps/GitOps/CICD pipelines and operational automations (provisioning upgrades backups).
- Infrastructure as code: Manage clusters networks storage and policies via Terraform/Ansible; prevent configuration drift.
- Security and compliance: Enforce identity/RBAC secrets management supply chain security and regulatory controls; collaborate with risk and audit.
- Scalability and cost: Optimize resource usage plan capacity control spend (rightsizing autoscaling reservations/spot).
- Change management: Safe rollouts progressive delivery and policy-as-code guardrails.
- Platform productization: Treat the platform as a product define operations SLAs in alignment to product roadmap service catalog and developer experience.
- Collaborate with global engineering security and AI governance teams to ensure compliance with cross-geo regulations and Asias data residency requirements.
- Operate scalable backend services supporting high-traffic agent interactions retrieval operations and real-time execution flows.
- Maintain AI services runbooks playbooks and enablement for GOCC
Required Qualifications:
- Bachelors in Computer Science/Engineering or equivalent experience (not strictly required if skills demonstrated).
- 5-8 years experience in DevOps/Platform Engineering or Production Operations.
- Proven track record operating large-scale distributed systems and running on-call.
- Operational experience with cloud-native development: Azure Kubernetes containers CI/CD and observability stacks.
- Knowledge with Python and/or Java/Scala/TypeScript for building backend services and automation.
- Understanding of AI solution LLM systems retrieval architectures embeddings vector stores prompt/tool orchestration and agent workflow fundamentals.
- Knowledge of API design asynchronous workflows concurrency reliability engineering (SLOs error budgets) and performance tuning.
- Familiarity with security governance and compliance for AI/data systems (authN/authZ data protection audit logging model governance).
- Ability to collaborate across global teams and translate business requirements into platform capabilities and operational SLAs.
Preferred Qualifications:
- ITIL & ITSM certification
- Azure Administrator/DevOps certificate (nice to have)
- Kubernetes: CKA/CKS certificate (nice to have)
- HashiCorp Terraform Associate certificate (nice to have)
When you join our team:
- Well empower you to learn and grow the career you want.
- Well recognize and support you in a flexible environment where well-being and inclusion are more than just words.
- As part of our global team well support you in shaping the future you want to see.
#LI-Hybrid
The role being advertised is an existing vacancy.
About Manulife and John Hancock
Manulife Financial Corporation is a leading international financial services provider helping people make their decisions easier and lives better. To learn more about us visit is an Equal Opportunity Employer
At Manulife/John Hancock we embrace our diversity. We strive to attract develop and retain a workforce that is as diverse as the customers we serve and to foster an inclusive work environment that embraces the strength of cultures and individuals. We are committed to fair recruitment retention advancement and compensation and we administer all of our practices and programs without discrimination on the basis of race ancestry place of origin colour ethnic origin citizenship religion or religious beliefs creed sex (including pregnancy and pregnancy-related conditions) sexual orientation genetic characteristics veteran status gender identity gender expression age marital status family status disability or any other ground protected by applicable law.
It is our priority to remove barriers to provide equal access to employment. A Human Resources representative will work with applicants who request a reasonable accommodation during the application process. All information shared during the accommodation request process will be stored and used in a manner that is consistent with applicable laws and Manulife/John Hancock policies. To request a reasonable accommodation in the application process contact .
Referenced Salary Location
Toronto OntarioWorking Arrangement
Salary range is expected to be between
$113260.00 CAD - $210340.00 CADEmployees also have the opportunity to participate in incentive programs and earn incentive compensation tied to business and individual performance. The actual salary will vary depending on local market conditions geography and relevant job-related factors such as knowledge skills qualifications experience and education/training. If you are applying for this role outside of the primary location please contact for the salary range for your location.
Manulife offers eligible employees a wide array of customizable benefits including health dental mental health vision short- and long-term disability life and AD&D insurance coverage adoption/surrogacy and wellness benefits and employee/family assistance plans. We also offer eligible employees various retirement savings plans (including pension and a global share ownership plan with employer matching contributions) and financial education and counseling resources. Our generous paid time off program in Canada includes holidays vacation personal and sick days and we offer the full range of statutory leaves of absence. If you are applying for this role in the U.S. please contact for more information about U.S.-specific paid time off provisions.
We use data and analytics technologies such as artificial intelligence (AI) and automated processing tools to analyze and process the information you provide to us or third parties in the application process. For more information please refer to our personal information collection statement.
Required Experience:
IC
Key Skills
About Company
Manulife is a leading financial services group. We provide financial advice, insurance, as well as wealth and asset management solutions for individuals, groups and institutions.