Lead Platform Engineer (DevOps & MLOps)

Shore Consulting


Job Location:

Toronto - Canada

Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Reporting to the Director Platform Services the Lead DevOps Engineer is a senior hands-on technical leader responsible for building operating and continuously improving Afflos production and non-production cloud environments CI/CD pipelines observability and operational toolchain. This role translates the VPs operational strategy standards and compliance objectives into reliable scalable secure implementation while leading execution across the DevOps function day to day.

You will work closely with Product Engineering QA Service Management Implementation/Project Delivery and external vendors to ensure Afflo services meet uptime performance security and audit expectations in regulated healthcare contexts. You will also mentor other DevOps engineers lead incident response and prevention work and drive practical improvements that reduce operational risk and accelerate safe delivery.

This role is demanding and diverse involving:

  • Operational ownership of cloud infrastructure and delivery pipelines

  • Release engineering and environment lifecycle management

  • Observability incident leadership and continuous improvement

  • Security controls evidence readiness and DR/BCP execution

  • Tooling automation that reduces toil and improves team productivity

Responsibilities

Operational Ownership

  • Own the reliability and day-to-day operation of Afflo environments (production and non-production) ensuring uptime performance responsiveness and strong operational hygiene.

  • Lead triage mitigation and restoration during incidents; coordinate with Service Management and engineering stakeholders through resolution.

  • Conduct and author post-incident reviews and drive prevention work to reduce recurrence improve MTTR and increase change safety.

  • Establish and maintain on-call standards escalation paths maintenance practices and operational runbooks aligned with IT Operations and System Administration policies.

 

Cloud Infrastructure Engineering (IaC-First)

  • Design build and maintain secure resilient cloud infrastructure using Infrastructure as Code (IaC) with reusable modules review discipline and predictable environment patterns.

  • Build and improve environment lifecycle workflows (provision reset clone teardown) for QA/UAT/demo/customer environments and internal team needs.

  • Implement secure-by-default patterns: network segmentation least privilege secrets handling encryption audit logging and access reviews.

  • Perform capacity planning and cost optimizationbalancing availability scalability and operating cost and providing actionable recommendations to the VP of Delivery.

  • Design set up and maintain AI specific workloads and pipelines. E.g. Data processing model training inference etc.

 

CI/CD Release Engineering and Delivery Enablement

  • Build and maintain automated CI/CD pipelines to enable rapid safe deployments including release gates automated checks artifact integrity and rollback readiness.

  • Participate in and/or lead major release windows and maintenance deployments; ensure readiness checks comms coordination and post-release verification.

  • Standardize release processes across teams/products to reduce variance improve predictability and support project timelines and SLAs.

  • Partner with Product Engineering and QA to improve test reliability deployment quality and developer experience.

Observability and Monitoring

  • Implement and maintain monitoring alerting logging and dashboards that provide actionable signals for availability performance security and data integrity.

  • Reduce alert noise and improve detection coverage through tuning SLO/SLI development and automated verification checks.

  • Provide operational insights to engineering teams using logs and metrics to identify trends performance constraints and failure patterns.

Security Compliance and Audit Readiness Enablement

  • Implement operational controls and evidence-producing mechanisms aligned with IT Operations policies and selected frameworks (e.g. SOC2/ISO-aligned practices).

  • Support security and governance requests by producing operational materials (diagrams environment descriptions safeguards maintenance practices) and operational evidence in a timely manner.

  • Coordinate with Security/Service Management on vulnerability management patching practices vendor security events and operational monitoring requirements.

  • Contribute to disaster recovery and business continuity readiness by maintaining runbooks validating backups/restores and participating in recovery exercises/tabletop tests.

Tooling Internal Enablement and Cross-Team Support

  • Support onboarding/offboarding and access provisioning across enterprise tools (email document storage chat ticketing VPN dev/QA environment access) emphasizing least privilege and traceability.

  • Build automation/scripts to streamline frequent employee tasks and reduce operational toil.

  • Maintain a clear prioritized operational ticket pipeline; triage requests track outcomes and communicate progress and risks.

Vendor Collaboration and Operational Toolchain

  • Work with vendors and internal stakeholders to procure configure and maintain operational tooling (hosting monitoring backups authentication services pipeline tools).

  • Coordinate vendor-driven maintenance/outages and ensure internal and customer-facing communications occur when required.

  • Provide practical input on tool selection and implementation feasibility aligned to the VPs standards and roadmap.

Leadership Within the DevOps Function

  • Mentor DevOps engineers through pairing code/IaC reviews incident coaching and documentation/runbook development.

  • Raise team maturity by defining how we do it here: templates standards checklists guardrails and repeatable operational processes.

  • Serve as senior escalation for complex infrastructure/pipeline issues and lead cross-team problem-solving efforts.


Qualifications :

  • Bachelors degree in Computer Science Engineering or a related field (or equivalent practical experience).

  • 710 years of progressive experience in DevOps/SRE/infrastructure engineering including ownership of production systems.

  • Strong Linux networking and troubleshooting skills across distributed systems.

  • Advanced experience with cloud environments (Azure and/or GCP preferred; multi-cloud exposure is an asset).

  • Expert-level Infrastructure as Code experience (e.g. Terraform/Pulumi) including modular design review practices and safe change management.

  • Strong Kubernetes experience (operations deployments security posture cluster/platform troubleshooting).

  • Strong experience with designing and building AI training and inference workflows in a cloud environment.

  • Proven CI/CD and release engineering experience (e.g. GitLab CI Jenkins ArgoCD or equivalent) including quality gates and safe deployment strategies.

  • Proven experience with software development lifecycle (SDLC) methodologies and best practices 

  • Experience with IT Service Management (ITSM) (ServiceNow JIRA Service Management BNC Remedy) and Kanban project management (JIRA Software or equivalent).

  • Demonstrated incident leadership (on-call participation incident coordination RCA authorship prevention follow-through).

  • Security-minded approach: least privilege secrets management vulnerability management audit logging and regulated-environment operational discipline.

  • Excellent written and verbal communication skills; strong documentation habits (knowledge base runbooks diagrams procedures).

  • Ability to work under deadlines switch contexts quickly and deliver across multiple initiatives.


Additional Information :

Nice-to-Haves

  • Experience supporting regulated healthcare or PHI-adjacent environments and governance expectations.

  • Experience supporting SOC2/ISO-style audits (evidence control operation policy-driven operations).

  • Familiarity with internal IT tooling and identity/access systems (e.g. SSO VPN device management patterns).

  • Experience building internal developer platforms or golden path delivery tooling.


Remote Work :

No


Employment Type :

Full-time

Reporting to the Director Platform Services the Lead DevOps Engineer is a senior hands-on technical leader responsible for building operating and continuously improving Afflos production and non-production cloud environments CI/CD pipelines observability and operational toolchain. This role translat...

About Company

Shore is an IT and strategy consulting firm focusing on innovation in the public sector. We deliver services and tools that advance public sector organizations and the services they provide. Shore’s working environment is flexible, collaborative, and down to earth. We work hard and de ... View more

View Profile View Profile