DescriptionWe are seeking a Delivery SRE leader who will ensure security applications are delivered with strong SDLC discipline and measurable reliability. This role partners closely with Product Owners and engineering leadership to challenge assumptions sharpen the Definition of Done and bake SRE requirements into design and build phases. The leader will govern operational readiness quality gates and resilience practices so that every release meets agreed SLOs and is production ready.
Key Responsibilities:
- Define and enforce quality gates across requirements design secure coding testing release and post-production monitoring translate business objectives into clear testable requirements that include reliability availability performance security and observability.
- Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans challenge Product Owners and teams to meet a rigorous objective Definition of Done before release.
- Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit integration security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.
- Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.
- Maintain logging alerting and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security reliability and change management; promote automation to eliminate toil.
- Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and products
- Oversee capacity planning and resilience management for large-scale distributed systems Partner with engineering on public cloud best practices (AWS or equivalent) for compute storage networking messaging automation (CloudFormation Terraform) and data services.
- Build a culture of collaboration reliability and continuous improvement; coach teams to adopt DevOps and SRE with regional engineering leaders to drive operational best practices and consistent execution. Provide concise outcome-focused updates to management and stakeholders; influence decisions across Product Engineering SRE and Security.
Required Qualifications Capabilities and Skills
#CTC
Required Experience:
Senior IC
DescriptionWe are seeking a Delivery SRE leader who will ensure security applications are delivered with strong SDLC discipline and measurable reliability. This role partners closely with Product Owners and engineering leadership to challenge assumptions sharpen the Definition of Done and bake SRE r...
DescriptionWe are seeking a Delivery SRE leader who will ensure security applications are delivered with strong SDLC discipline and measurable reliability. This role partners closely with Product Owners and engineering leadership to challenge assumptions sharpen the Definition of Done and bake SRE requirements into design and build phases. The leader will govern operational readiness quality gates and resilience practices so that every release meets agreed SLOs and is production ready.
Key Responsibilities:
- Define and enforce quality gates across requirements design secure coding testing release and post-production monitoring translate business objectives into clear testable requirements that include reliability availability performance security and observability.
- Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans challenge Product Owners and teams to meet a rigorous objective Definition of Done before release.
- Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit integration security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.
- Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.
- Maintain logging alerting and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security reliability and change management; promote automation to eliminate toil.
- Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and products
- Oversee capacity planning and resilience management for large-scale distributed systems Partner with engineering on public cloud best practices (AWS or equivalent) for compute storage networking messaging automation (CloudFormation Terraform) and data services.
- Build a culture of collaboration reliability and continuous improvement; coach teams to adopt DevOps and SRE with regional engineering leaders to drive operational best practices and consistent execution. Provide concise outcome-focused updates to management and stakeholders; influence decisions across Product Engineering SRE and Security.
Required Qualifications Capabilities and Skills
#CTC
Required Experience:
Senior IC
View more
View less