Role: SRE - AWS
Location: Toronto ON
Contract
JD Below:
Resiliency & Operational Excellence - AWS Serverless Dynatrace
Reliability resiliency and operational excellence for mission critical AWS serverless platforms ensuring high availability low MTTR and strong production governance using Dynatrace driven observability.
- Resiliency strategy for serverless architectures (Lambda API Gateway async/event driven systems)
- SLOs / SLIs / Error Budgets for critical APIs
- Incident analysis and post incident reviews
- Dynatrace observability: dashboards alert tuning dependency mapping RCA acceleration
- Operational excellence improvements: incident reduction MTTR improvement toil automation
- Reliability guardrails embedded into CI/CD and production readiness reviews
Core Responsibilities
- Design & enforce resiliency patterns: timeouts retries circuit breakers throttling graceful degradation
- Lead major incidents and drive actionable RCAs with sustained fixes
- Build signal driven alerts aligned to SLOs (noise reduction focus)
- Enable automation & self healing where feasible
Required Experience
- 5-6 years in SRE/DevOps/Production Engineering
- Deep hands on with AWS serverless (Lambda API Gateway SQS/SNS DynamoDB/RDS)
- Strong expertise in Dynatrace for serverless monitoring & triage
- Proven success improving availability MTTR and incident trends
- Solid coding/scripting (Python / Java / )
Role: SRE - AWS Location: Toronto ON Contract JD Below: Resiliency & Operational Excellence - AWS Serverless Dynatrace Reliability resiliency and operational excellence for mission critical AWS serverless platforms ensuring high availability low MTTR and strong production governance using Dyna...
Role: SRE - AWS
Location: Toronto ON
Contract
JD Below:
Resiliency & Operational Excellence - AWS Serverless Dynatrace
Reliability resiliency and operational excellence for mission critical AWS serverless platforms ensuring high availability low MTTR and strong production governance using Dynatrace driven observability.
- Resiliency strategy for serverless architectures (Lambda API Gateway async/event driven systems)
- SLOs / SLIs / Error Budgets for critical APIs
- Incident analysis and post incident reviews
- Dynatrace observability: dashboards alert tuning dependency mapping RCA acceleration
- Operational excellence improvements: incident reduction MTTR improvement toil automation
- Reliability guardrails embedded into CI/CD and production readiness reviews
Core Responsibilities
- Design & enforce resiliency patterns: timeouts retries circuit breakers throttling graceful degradation
- Lead major incidents and drive actionable RCAs with sustained fixes
- Build signal driven alerts aligned to SLOs (noise reduction focus)
- Enable automation & self healing where feasible
Required Experience
- 5-6 years in SRE/DevOps/Production Engineering
- Deep hands on with AWS serverless (Lambda API Gateway SQS/SNS DynamoDB/RDS)
- Strong expertise in Dynatrace for serverless monitoring & triage
- Proven success improving availability MTTR and incident trends
- Solid coding/scripting (Python / Java / )
View more
View less