Job Title: SRE Lead Banking Domain (Wealth Management Preferred)
Location: Toronto Downtown ON (Onsite 5 Days/Week)
6-12 Months Contract
Experience: 12 Years
About the Role:
We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain ideally within Wealth Management. The ideal candidate will lead the SRE function to ensure system reliability scalability and performance across mission-critical financial applications. This role involves hands-on technical expertise combined with leadership responsibilities to drive service excellence and operational efficiency.
Key Responsibilities:
- Lead and mentor a team of SREs responsible for production stability reliability and availability of banking and wealth management systems.
- Design and implement monitoring alerting and incident response strategies to proactively manage system health.
- Collaborate with development and infrastructure teams to drive DevOps and automation initiatives ensuring smooth CI/CD pipelines.
- Define and implement SLIs SLOs and SLAs to measure and improve service performance.
- Manage and drive incident management root cause analysis (RCA) and problem resolution to ensure minimal downtime and business impact.
- Lead capacity planning performance tuning and disaster recovery strategies.
- Drive observability and resilience engineering best practices across all platforms.
- Work closely with stakeholders in banking and wealth management domains to align reliability goals with business needs.
- Establish governance processes and ensure compliance with financial regulatory and security standards.
- Develop dashboards and reporting metrics to provide visibility into system performance and reliability.
- Champion a culture of continuous improvement automation and reliability-first mindset.
Required Skills & Experience:
- 10 years of total IT experience with at least 4 years in Site Reliability Engineering or Production Operations leadership roles.
- Strong domain experience in Banking with exposure to Wealth Management systems (highly desirable).
- Expertise in Linux/Unix administration networking and cloud infrastructure (AWS Azure or GCP).
- Strong scripting and automation experience (Python Shell or similar).
- Proficiency in monitoring and observability tools such as Prometheus Grafana Splunk ELK AppDynamics or Dynatrace.
- Experience with CI/CD pipelines Git Jenkins Ansible Terraform or equivalent tools.
- In-depth understanding of incident problem and change management based on ITIL principles.
- Proven track record in managing production systems supporting large-scale high-availability financial applications.
- Excellent communication stakeholder management and team leadership skills.
Job Title: SRE Lead Banking Domain (Wealth Management Preferred) Location: Toronto Downtown ON (Onsite 5 Days/Week) 6-12 Months Contract Experience: 12 Years About the Role: We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Bank...
Job Title: SRE Lead Banking Domain (Wealth Management Preferred)
Location: Toronto Downtown ON (Onsite 5 Days/Week)
6-12 Months Contract
Experience: 12 Years
About the Role:
We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain ideally within Wealth Management. The ideal candidate will lead the SRE function to ensure system reliability scalability and performance across mission-critical financial applications. This role involves hands-on technical expertise combined with leadership responsibilities to drive service excellence and operational efficiency.
Key Responsibilities:
- Lead and mentor a team of SREs responsible for production stability reliability and availability of banking and wealth management systems.
- Design and implement monitoring alerting and incident response strategies to proactively manage system health.
- Collaborate with development and infrastructure teams to drive DevOps and automation initiatives ensuring smooth CI/CD pipelines.
- Define and implement SLIs SLOs and SLAs to measure and improve service performance.
- Manage and drive incident management root cause analysis (RCA) and problem resolution to ensure minimal downtime and business impact.
- Lead capacity planning performance tuning and disaster recovery strategies.
- Drive observability and resilience engineering best practices across all platforms.
- Work closely with stakeholders in banking and wealth management domains to align reliability goals with business needs.
- Establish governance processes and ensure compliance with financial regulatory and security standards.
- Develop dashboards and reporting metrics to provide visibility into system performance and reliability.
- Champion a culture of continuous improvement automation and reliability-first mindset.
Required Skills & Experience:
- 10 years of total IT experience with at least 4 years in Site Reliability Engineering or Production Operations leadership roles.
- Strong domain experience in Banking with exposure to Wealth Management systems (highly desirable).
- Expertise in Linux/Unix administration networking and cloud infrastructure (AWS Azure or GCP).
- Strong scripting and automation experience (Python Shell or similar).
- Proficiency in monitoring and observability tools such as Prometheus Grafana Splunk ELK AppDynamics or Dynatrace.
- Experience with CI/CD pipelines Git Jenkins Ansible Terraform or equivalent tools.
- In-depth understanding of incident problem and change management based on ITIL principles.
- Proven track record in managing production systems supporting large-scale high-availability financial applications.
- Excellent communication stakeholder management and team leadership skills.
View more
View less