Work Mode: Hybrid (2 days per week in-person at Toronto office preferred)
Skills required:
1012 years in technical program/project management with at least 35 years in data platforms and AI/ML operations.
Strong understanding of data architectures (lake/lakehouse warehouse streaming) data governance and MLOps/ModelOps concepts.
MLOps/AI: Azure ML SageMaker Vertex AI; MLflow model registry feature stores drift/fairness/explainability tools.
Data Governance: Purview Collibra Alation; data lineage cataloging DQ tooling.
Orchestration and CI/CD: Airflow Prefect dbt; GitHub Actions/Azure DevOps/Jenkins; Terraform/Bicep/CloudFormation.
Monitoring/Observability: Prometheus/Grafana cloud-native monitors logging data quality monitors model monitoring.
Cloud & Data: Azure (Synapse Fabric) AWS (S3/Glue/Redshift) GCP (BigQuery/Dataflow) Databricks Snowflake.
Proven experience embedding security/privacy-by-design and RAI principles into delivery and ops.
Excellent stakeholder management vendor management and executive communication skills.
Roles and responsibilities
Program Delivery Leadership
Own end-to-end delivery of data platform and AI/ML operational initiatives discovery design implementation hypercare steady-state operations.
Maintain multi-quarter roadmap backlog and release trains (Scrum Kanban SAFe) run standups PI planning demos and retros.
Manage dependencies across data ingestion storage processing cataloging lineage access MLOps pipelines and app integrations.
Orchestrate cross-functional squads.
Data Engineering Platform SRE Security Risk Legal and Business to deliver secure governed and compliant data capabilities and AI services at scale.
Own roadmaps delivery governance risk controls release management and post-production reliability for data AI workloads and ensuring Responsible AI principles are codified into day-to-day operations.
Platform Technical Ownership
Partner with Platform Engineering
SRE to evolve the data platform reference architecture.
Drive integration and operationalization of MLOps and Model Ops practices.
Oversee environment strategy (dev test stage prod) IaC-driven provisioning cost guardrails and performance SLAs.
Responsible AI Data Governance.
Embed Responsible AI guardrails into SDLC and runtime model cards fairness bias checks explainability human-in-the-loop monitoring drift and incident response.
Operationalize data governance meta data catalog lineage PII classification DLP RBAC (Role-Based Access Control) ABAC (Attribute-Based Access Control) data quality SLAs retention deletion schedules.
Align with privacy security and regulatory frameworks (e.g. privacy laws model risk management and AI assurance frameworks).
Risk and Compliance Controls
Maintain risk register control library audit trail approvals and evidence for releases and model lifecycle events.
Run change advisory (CAB) workflows for platform and model changes ensure traceability from requirements to deployment and monitoring.
Stakeholder Management Communication.
Translate business outcomes into measurable platform and AI service capabilities SLIs and SLOs.
Provide executive-level status (OKRs KPIs burn-up down RAID budget vs. actuals)
Certifications (nice-to-have):
PMP/Prince2 CSM/SAFe Azure/AWS/GCP data/AI Databricks/Snowflake Governance/Privacy.
Required Skills:
Work Mode: Hybrid (2 days per week in-person at Toronto office preferred) Skills required: 1012 years in technical program/project management with at least 35 years in data platforms and AI/ML operations. Strong understanding of data architectures (lake/lakehouse warehouse streaming) data governance and MLOps/ModelOps concepts. MLOps/AI: Azure ML SageMaker Vertex AI; MLflow model registry feature stores drift/fairness/explainability tools. Data Governance: Purview Collibra Alation; data lineage cataloging DQ tooling. Orchestration and CI/CD: Airflow Prefect dbt; GitHub Actions/Azure DevOps/Jenkins; Terraform/Bicep/CloudFormation. Monitoring/Observability: Prometheus/Grafana cloud-native monitors logging data quality monitors model monitoring. Cloud & Data: Azure (Synapse Fabric) AWS (S3/Glue/Redshift) GCP (BigQuery/Dataflow) Databricks Snowflake. Proven experience embedding security/privacy-by-design and RAI principles into delivery and ops. Excellent stakeholder management vendor management and executive communication skills. Roles and responsibilities Program Delivery Leadership Own end-to-end delivery of data platform and AI/ML operational initiatives discovery design implementation hypercare steady-state operations. Maintain multi-quarter roadmap backlog and release trains (Scrum Kanban SAFe) run standups PI planning demos and retros. Manage dependencies across data ingestion storage processing cataloging lineage access MLOps pipelines and app integrations. Orchestrate cross-functional squads. Data Engineering Platform SRE Security Risk Legal and Business to deliver secure governed and compliant data capabilities and AI services at scale. Own roadmaps delivery governance risk controls release management and post-production reliability for data AI workloads and ensuring Responsible AI principles are codified into day-to-day operations. Platform Technical Ownership Partner with Platform Engineering SRE to evolve the data platform reference architecture. Drive integration and operationalization of MLOps and Model Ops practices. Oversee environment strategy (dev test stage prod) IaC-driven provisioning cost guardrails and performance SLAs. Responsible AI Data Governance. Embed Responsible AI guardrails into SDLC and runtime model cards fairness bias checks explainability human-in-the-loop monitoring drift and incident response. Operationalize data governance meta data catalog lineage PII classification DLP RBAC (Role-Based Access Control) ABAC (Attribute-Based Access Control) data quality SLAs retention deletion schedules. Align with privacy security and regulatory frameworks (e.g. privacy laws model risk management and AI assurance frameworks). Risk and Compliance Controls Maintain risk register control library audit trail approvals and evidence for releases and model lifecycle events. Run change advisory (CAB) workflows for platform and model changes ensure traceability from requirements to deployment and monitoring. Stakeholder Management Communication. Translate business outcomes into measurable platform and AI service capabilities SLIs and SLOs. Provide executive-level status (OKRs KPIs burn-up down RAID budget vs. actuals) Certifications (nice-to-have): PMP/Prince2 CSM/SAFe Azure/AWS/GCP data/AI Databricks/Snowflake Governance/Privacy.
IT Services and IT Consulting