Principal AI Cloud Engineer

Bank Of Montreal

Not Interested
Bookmark
Report This Job

profile Job Location:

Toronto - Canada

profile Monthly Salary: $ 103200 - 192000
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Application Deadline:

11/29/2025

Address:

100 King Street West

Job Family Group:

Data Analytics & Reporting

The Team

We accelerate BMOs AI journey by building enterprise-grade cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable secure and responsible solutions that power business innovation across the bank. We enable and accelerate our partners on their AI journeys across the enterprise helping teams across BMO unlock value at scale. We support one another in times of need and take pride in our work. We are engineers AI practitioners platform builders thought leaders multipliers and coders. Above all we are a global team of diverse individuals who enjoy working together to create smart secure and scalable solutions that make an impact across the enterprise. Our ambition is bold: deploy our capital and resources to their highest and most profitable use through a digital-first operating model powered by data and AI-driven decisions.

The Impact

As a Principal Cloud AI Engineer you are a hands-on technical developer who designs builds and scales cloud-native AI solutions and products. You help set engineering standards establish patterns mentor senior engineers and partner with multiple teams to deliver resilient governed and cost-efficient AI at enterprise scale. Youll help shape and evolve our AI cloud strategy from model serving and LLMOps to security observability and compliance so teams across the bank can innovate safely and rapidly.

You will advance BMOs Digital First strategy by:

  • Defining reference and production-grade solutions for AI/GenAI on cloud (AWS preferred; multi-cloud aware).

  • Building reusable secure and observable components (APIs SDKs microservices pipelines).

  • Operationalizing LLMs and RAG with strong controls and Responsible AI guardrails.

  • Driving platform roadmaps that enable faster delivery lower risk and measurable business outcomes.

Whats In It for You

  • Influence the technical direction of enterprise AI and the platform primitives others build on.

  • Ship high-impact systems used across many business lines and products.

  • Work across the full stack: cloud infra data/feature pipelines model serving LLMOps and DevSecOps.

  • Partner with a leadership team invested in your growth and thought leadership.

Responsibilities

Infrastructure & Platform Builder

  • Design build and operate cloud-native AI infrastructure for ML/GenAI workloads:

    • Compute: GPU/CPU clusters autoscaling spot instance strategies

    • Networking: AWS VPC PrivateLink peering multi-region HA/DR

    • Storage & Databases: high-performance data lakes (e.g. S3-based data lake) relational DBs vector DBs (FAISS Milvus Pinecone pgvector)

    • Security: IAM Secrets Manager / KMS-backed secrets management and encryption policy-as-code

  • Implement observability and reliability for AI infra:

    • Metrics (latency throughput GPU utilization cost)

    • Logging/tracing (OpenTelemetry) SLOs/SLIs for infra services

  • Build CI/CD and GitOps pipelines for infrastructure-as-code (Terraform/CloudFormation) and AI platform components

  • Drive FinOps for AI infra: GPU rightsizing caching inference optimization cost governance

Application & Service Enablement

  • Enable frontend and backend services for AI platforms:

    • Secure APIs microservices and event-driven architectures

    • Integration with custom model runtimes (TensorRT-LLM vLLM Triton/KServe)

  • Provide infrastructure support for RAG systems: embeddings chunking retrieval pipelines

  • Ensure scalable serving infrastructure for LLMs and ML models with caching and token optimization

Strategy & Architecture

  • Define and evolve AI infrastructure reference architecture for cloud (AWS preferred):

    • Container orchestration (Kubernetes/EKS) service mesh ingress

    • Serverless/event-driven patterns for AI pipelines

    • Multi-region HA/DR compliance-ready designs

  • Establish standards and best practices for containerization IaC and secure networking for AI systems

Security Risk & Governance

  • Implement defense-in-depth for AI infra:

    • IAM least privilege private networking KMS/Secrets Manager SBOM image signing

  • Ensure compliance and Responsible AI controls at infra level:

    • Data residency encryption lineage audit readiness

Delivery & Operations

  • Lead infrastructure discovery and solution design with stakeholders

  • Operate platforms with SRE principles: error budgets incident response chaos testing

  • Mentor engineers; create reusable IaC modules templates and golden paths

Must-Have Qualifications

  • Bachelors/Masters/PhD in CS Engineering or related field

  • 7 years building large-scale distributed cloud infrastructure

  • 5 years hands-on with AWS (preferred); Azure/GCP nice to have

  • Proven experience with AI/ML infra: GPU clusters Kubernetes CI/CD observability

  • Strong in IaC (Terraform/CloudFormation) Kubernetes networking security

  • Expertise in cloud-native patterns: containers service mesh serverless

  • Familiarity with MLOps/LLMOps infra: model serving feature stores vector DBs

  • Programming in Python (infra automation) and one of Go/TypeScript for tooling

  • Understanding of frontend/backend integration for AI services

  • Familiarity with MLOps/LLMOps infra: model serving feature stores vector DBs

  • Programming in Python (infra automation) and one of Go/TypeScript for tooling

  • Understanding of frontend/backend integration for AI services

Nice-to-Have

  • GPU optimization (CUDA/NCCL TensorRT-LLM)

  • Observability tools (Prometheus Grafana OpenTelemetry)

  • Event streaming (Kafka/Kinesis) real-time systems

  • Experience with AI platform products (Amazon SageMaker) MLflow KServe Hugging Face

Tech Stack

  • Cloud & Infra: AWS (EKS Lambda Kinesis Secrets Manager/KMS) Terraform/CloudFormation GitHub Actions/AWS CodePipeline

  • AI Infra: Kubernetes KServe/Triton vLLM TensorRT-LLM Ray Spark

  • Ops: Prometheus Grafana OpenTelemetry ArgoCD OPA

  • Data: Feature stores (Feast) vector DBs (FAISS Milvus Pinecone) relational DBs

  • App Layer: APIs microservices frontend/backend integration for AI systems

Success Metrics

  • Reliability & Performance: SLOs met for infra services GPU utilization optimized

  • Security & Compliance: Zero critical findings auditable infra

  • Cost Efficiency: Reduced GPU/infra spend via FinOps strategies

  • Developer Velocity: Faster provisioning and deployment of AI infra

  • Technical Leadership: Influence on infra standards mentorship reusable patterns

Salary:

$103200.00 - $192000.00

Pay Type:

Salaried

The above represents BMO Financial Groups pay range and type.

Salaries will vary based on factors such as location skills experience education and qualifications for the role and may include a commission structure. Salaries for part-time roles will be pro-rated based on number of hours regularly worked. For commission roles the salary listed above represents BMO Financial Groups expected target for the first year in this position.

BMO Financial Groups total compensation package will vary based on the pay type of the position and may include performance-based incentives discretionary bonuses as well as other perks and rewards. BMO also offers health insurance tuition reimbursement accident and life insurance and retirement savings plans. To view more details of our benefits please visit: Us

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting positive change for our customers our communities and our people. By working together innovating and pushing boundaries we transform lives and businesses and power economic growth around the world.

As a member of the BMO team you are valued respected and heard and you have more ways to grow and make an impact. We strive to help you make an impact from day one for yourself and our customers. Well support you with the tools and resources you need to reach new milestones as you help our customers reach theirs. From in-depth training and coaching to manager support and network-building opportunities well help you gain valuable experience and broaden your skillset.

To find out more visit us at is committed to an inclusive equitable and accessible workplace. By learning from each others differences we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation please contact your recruiter.

Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO directly or indirectly will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid written and fully executed agency agreement contract for service to submit resumes.


Required Experience:

Staff IC

Application Deadline:11/29/2025Address:100 King Street WestJob Family Group:Data Analytics & ReportingThe TeamWe accelerate BMOs AI journey by building enterprise-grade cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable secure and responsible...
View more view more

Key Skills

  • Design
  • Academics
  • AutoCAD 3D
  • Cafe
  • Fabrication
  • Java

About Company

Company Logo

We cover the whole balance sheet, from foreign exchange, trade finance and treasury management to corporate lending, securitization, public and private debt and equity underwriting. Our team of experts can also provide a full range of advisory services, along with industry-leading res ... View more

View Profile View Profile