drjobs Principal AI Cloud Engineer – Full Stack

Principal AI Cloud Engineer – Full Stack

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Toronto - Canada

Monthly Salary drjobs

$ 103200 - 192000

Vacancy

1 Vacancy

Job Description

Application Deadline:

10/30/2025

Address:

100 King Street West

Job Family Group:

Data Analytics & Reporting

We are back in office 2-4 days/week! This role is not remote/virtual.

The Team

We accelerate BMOs AI journey by buildingenterprise-grade cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable secure and responsible solutions that power business innovation across the bank. We enable and accelerate our partners on their AI journeys across the enterprise helping teams across BMO unlock value at scale. We support one another in times of need and take pride in our work. We are engineers AI practitioners platform builders thought leaders multipliers and coders. Above all we are a global team of diverse individuals who enjoy working together to create smart secure and scalable solutions that make an impact across the enterprise. Our ambition is bold: deploy our capital and resources to their highest and most profitable use through adigital-first operating model powered bydata and AI-driven decisions.

The Impact

As aPrincipal Cloud AI Engineer- Full Stack you are a hands-on technical developer who designs builds and scalescloud-native AI solutions and products. You help set engineering standards establish patterns mentor senior engineers and partner with multiple teams to deliverresilient governed and cost-efficient AI at enterprise scale.

Youll help shape and evolve ourAI cloud strategyfrom model serving and LLMOps to security observability and compliance so teams across the bank can innovate safely and rapidly.

You will advance BMOs Digital First strategy by:

  • Definingreference and production-grade solutionsfor AI/GenAI on cloud (Azure preferred; multi-cloud aware).

  • Buildingreusable secure and observable components(APIs SDKs microservices pipelines).

  • OperationalizingLLMs and RAGwith strong controls andResponsible AI guardrails.

  • Drivingplatform roadmapsthat enable faster delivery lower risk and measurable business outcomes.

Whats In It for You

  • Influence thetechnical directionof enterprise AI and the platform primitives others build on.

  • Shiphigh-impact systemsused across many business lines and products.

  • Work across the full stack:cloud infra data/feature pipelines model serving LLMOps and DevSecOps.

  • Partner with a leadership team invested in your growth and thought leadership.

Responsibilities

Product Builder

  • Build and operateAI/ML cloud-native systems: frontend backend integration to other systems feature stores training/serving infra vector databases model registriesCI/CD canary/blue-green andGitOpsfor AI.

  • Technical cloud-native implementation ofML/LLM observability(latency cost drift hallucination/guardrails quality & safety metrics) logging/tracing (OpenTelemetry) and SLOs/SLIs for production AI systems.

  • Design and implement robust CI/CD pipelinesfor AI/ML workloads using GitHub Actions and Azure DevOps including automated testing model validation security scanning model versioning and blue/green or canary deployments to ensure safe repeatable and auditable releases.

  • DriveFinOpsfor AI/GPU workloads (rightsizing autoscaling spot caching inference optimization).

Strategy

  • Help evolve thecloud AI reference design(networking security data serving observability) for ML/GenAI workloads (batch streaming online) with HA/DR multi-region patterns and cost efficiency.

  • Work onstandards and best practicesfor containerization microservices serverless event-driven design and API management for AI systems.

GenAI & LLMOps

  • ArchitectRAG systems(chunking embeddings vector stores grounding evaluation) andguardrail frameworks(prompt/content safety PII redaction jailbreak & injection defenses).

  • Leadmodel serving(LLMs and traditional ML) using performant runtimes (e.g. TensorRT-LLM vLLM Triton/KServe) and caching strategies; optimize token usage throughput and cost.

  • Guidefine-tuning/PEFT/LoRA strategies evaluation frameworks (offline/online A/B) and safety/quality scorecards; standardize prompt libraries and prompt engineering patterns.

Security Risk & Governance

  • Implementdefense-in-depth: IAM least privilege private networking KMS/Key Vault secrets mgmt image signing/SBOM policy-as-code (OPA/Azure Policy) and data sovereignty controls.

  • EmbedResponsible AI: model documentation lineage explainability fairness testing and human-in-the-loop patterns; align to model risk management and audit needs.

  • Ensureregulatory and privacy compliance(e.g. PII handling encryption in transit/at rest approved data sources retention & residency).

Delivery & Operations

  • Leadcomplex discovery and solution designwith stakeholders; build strong business cases (value feasibility ROI).

  • Overseeproduction readinessand operate platforms withSRE principles(SLOs error budgets incident response chaos testing playbooks).

  • Mentor engineers; multiply team impact viareusable components templates and inner-source.

Qualifications

Must Have

  • Bachelors Masters or PhD in Computer Science Engineering Mathematics or related field (or equivalent experience).

  • 7 yearsbuilding large-scale distributed cloud systems;5 yearshands-on with cloud (Azure preferred; AWS/GCP nice to have).

  • Proven experience designing and operatingproduction ML/GenAI systems(training serving monitoring) and shipping AI features at scale on cloud.

  • Strong software engineering inPython(and one of Go/Java/TypeScript); deep expertise with APIs async patterns and performance optimization.

  • Hands-on withMLOps/LLMOps: MLflow KServe/Triton Feast/feature stores vector DBs (e.g. FAISS Milvus Pinecone pgvector Cosmos DB with vectors) orchestration (Airflow/Prefect) andCI/CD for ML (GitHub Actions/Azure DevOps).

  • Cloud-native stack:Kubernetes (AKS/EKS) containers service mesh/ingress serverless (Azure Functions/Lambda) IaC (Terraform/Bicep) secrets & key management VNet/Private Link/peering.

  • GenAI production experience:RAG evaluation prompt engineering fine-tuning/PEFT/LoRA and integration with providers (e.g. Azure OpenAI/OpenAI Anthropic Google open-source models via Hugging Face).

  • Excellent communication; ability to influence across engineering product security and risk.

Nice to Have

  • GPU systems & inference optimization (CUDA/NCCL TensorRT-LLM vLLM TGI); Ray/Spark/Databricks for distributed training/inference.

  • Observability: Prometheus/Grafana OpenTelemetry ML observability (e.g. WhyLabs Arize) data quality (Great Expectations).

  • Event streaming and real-time systems (Kafka/Event Hubs) micro-batching CQRS.

  • Search & knowledge systems (Elastic OpenSearch Knowledge Graphs).

Tech Youll Use (Illustrative)

  • Cloud & Infra:Azure (AKS Functions App Service Event Hubs API Management Key Vault Private Link Monitor) Terraform/Bicep GitHub Actions/Azure DevOps.

  • AI/ML:Python PyTorch ONNX MLflow Hugging Face LangChain/LangGraph OpenAI/Azure OpenAI Anthropic vector DBs (FAISS/Milvus/Pinecone/pgvector/Cosmos DB vectors).

  • Serving & Ops:KServe/Triton vLLM/TensorRT-LLM Prometheus/Grafana OpenTelemetry Great Expectations ArgoCD/GitOps OPA/Azure Policy.

  • Data & Orchestration:Spark/Databricks Ray Airflow/Prefect Kafka/Event Hubs Feast/feature store patterns.

How Youll Measure Success

  • Reliability & Performance:SLOs met for AI services (latency availability quality); scalable throughput and GPU/infra efficiency.

  • Security & Compliance:Zero critical findings; auditable lineage and model documentation; RAI controls consistently applied.

  • Developer Velocity:Time-to-first model and time-to-production reduced via reusable components and golden paths.

  • Business Impact:Clear ROI adoption across lines of business measurable customer/employee experience improvements.

  • Technical Leadership:Mentorship architectural influence and uplift across teams; strong cross-functional partnerships.

Notes

  • Additional responsibilities may be assigned based on your career growth ambitions and evolving enterprise needs.

  • This role isindividual contributor senior technical leadership (Principal) driving impact through architecture code and influence rather than direct line management.

Salary:

$103200.00 - $192000.00

Pay Type:

Salaried

The above represents BMO Financial Groups pay range and type.

Salaries will vary based on factors such as location skills experience education and qualifications for the role and may include a commission structure. Salaries for part-time roles will be pro-rated based on number of hours regularly worked. For commission roles the salary listed above represents BMO Financial Groups expected target for the first year in this position.

BMO Financial Groups total compensation package will vary based on the pay type of the position and may include performance-based incentives discretionary bonuses as well as other perks and rewards. BMO also offers health insurance tuition reimbursement accident and life insurance and retirement savings plans. To view more details of our benefits please visit: Us

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting positive change for our customers our communities and our people. By working together innovating and pushing boundaries we transform lives and businesses and power economic growth around the world.

As a member of the BMO team you are valued respected and heard and you have more ways to grow and make an impact. We strive to help you make an impact from day one for yourself and our customers. Well support you with the tools and resources you need to reach new milestones as you help our customers reach theirs. From in-depth training and coaching to manager support and network-building opportunities well help you gain valuable experience and broaden your skillset.

To find out more visit us at is committed to an inclusive equitable and accessible workplace. By learning from each others differences we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation please contact your recruiter.

Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO directly or indirectly will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid written and fully executed agency agreement contract for service to submit resumes.


Required Experience:

Staff IC

Employment Type

Full-Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.