SRE/DevOps Engineer - Toronto (4 days onsite)
Seeking to hire a Senior Site Reliability Engineer for its Application Maintenance and Transformation Data Services and Integration team. As a Senior Site Reliability Engineer you will bring the engineering mindset of bold ambition curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment interacting with cross-functional teams to establish best practices for observability monitoring logging alerting and automation.
What will you do
Set vision for SRE product base (monitoring alerting self-healing reliability testing).
Lead cross-functional collaborations to define and implement best practices for monitoring logging and incident response driving a proactive stance on system health.
Function as portfolio SME (Subject Matter Expert) understand & document common components core functionalities infrastructure of supported applications.
Actively participate in deploying software applications automation tools and IT infrastructure.
Work closely with development teams to understand code changes and their impact on the production environment ensuring that new releases meet our reliability standards.
Drive transformation by continuously looking for ways to automate existing SRE processes and increase operational efficiency.
Guide the technical direction for future deployments advocating for reliability and performance improvements based on industry trends and company objectives.
Lead in incident management and problem management for applications in scope and RCA action items fulfillment/ownership.
Debug production issues across services and levels of the stack and provide primary operational support.
Perform occasional off-hours support.
Must-have:
Bachelors degree in Computer Science Electrical or Electronics Engineering or related field or equivalent experience.
3 years IT experience in software development and/or maintenance or SRE or DevOps Engineering experience.
1 years experience building Java Spring boot applications and rest API development.
Experience working on relational databases MS-SQL Server or MySQL MariaDB and SingleStore or in-memory distributed databases.
Experience working on Containerization platforms such as Docker and container orchestration tools like Kubernetes (Azure Kubernetes or OpenShift Kubernetes Service preferred).
Solid Git skills with experience working on popular CI tools - Jenkins or UCD
Experience working on Windows and Linux based infrastructure.
1 years developing cloud-native applications using Java or Python.
Experience writing SQL queries and fine tuning or optimization skills.
Experience using centralized logging solutions (Splunk Elk (preferred) etc.) and active monitoring systems (Dynatrace etc.)
Experience deploying and operating cloud-native applications in a Private (OpenShift) or public cloud (Azure/AWS preferred)
In-depth and proactive communication skills around status of projects/issues in production
Must be a self-starter motivated resourceful and driven to work with cross functional teams in large enterprises with complex org structures to meet business timelines on delivery.
Financial Services domain knowledge preferably Capital Markets and Wealth Management.
Nice-to-have:
Experience implementing dashboards to help teams visualize logs instrumentation and other data to ensure optimal performance of the platform services infra and deployed applications (Grafana preferred).
Exposure to Datawarehouses like Informatica Snowflake or Databricks and Business intelligence tools like SAP BO or similar.
Experience creating runbooks processes and test plans around reliability performance etc. of infrastructure and applications.
Exposure to PagerDuty Postman ServiceNow SonarQube NexusIQ and vault tools.
Exposure to event brokers like Kafka or IBM-MQ Mainframe tools and environment
Exposure to Industry Disaster recovery test exercises.