Job Title: Technical Project Manager SRE Operations
Location: Miami FL- Onsite
Duration: Fulltime/ Permanent
Client is seeking a highly skilled Technical Project Manager with strong expertise in Site Reliability Engineering (SRE) Automation Cloud Operations and AIOps. The ideal candidate will combine strong technical depth with outstanding project leadership ensuring high availability reliability and automation-driven efficiency across large-scale distributed systems.
Key Responsibilities:
Program & Project Leadership:
- Lead end to end delivery of SRE and Operations modernization projects across cloud network and platform environments.
- Manage cross functional engineering teams vendors and partners to deliver high-quality solutions on schedule.
- Drive operational transformation through automation observability and AI driven insights.
- Develop detailed project plans milestones risk logs and communication plans for technical initiatives.
Required Skills & Qualifications:
- 12 years of experience in technical project/program management with at least 4 6 years in SRE/DevOps/Operations.
- Strong understanding of cloud platforms (AWS Azure GCP) and containerized environments (Kubernetes Docker).
-
- Hands-on familiarity with automation tools:
- Terraform / Ansible / Jenkins
- Python/Go/Bash scripting
- CI/CD pipelines
-
- Deep knowledge of:
- Observability stacks (New Relic Grafana ELK Splunk Catchpoint)
- Incident & change management systems (ServiceNow Jira)
- Proven experience deploying or managing AIOps platforms
- Strong analytical and problem-solving skills with ability to lead high-impact incidents.
- Excellent communication leadership and vendor management capabilities.
SRE & Operations Management:
- Oversee reliability engineering initiatives: incident management problem management capacity planning and performance optimization.
- Ensure SLO/SLI/SLA compliance across e2e infrastructure and customer-facing platforms.
- Implement best in class practices for monitoring alerting and service resilience.
Automation & AIOps:
- Lead automation programs using scripting orchestration and Infrastructure-as-Code (IaC) techniques.
- Champion AIOps solutions for predictive analytics smart alerting anomaly detection and automated remediation.
- Partner with engineering teams to build self-healing capabilities and reduce MTTR.
Stakeholder Management:
- Serve as the primary interface between engineering operations product and leadership teams.
- Present program updates operational metrics and business impact to senior executives.
- Ensure stakeholder alignment on priorities roadmaps and technical dependencies.
- Compliance Governance & Telecom Standards:
- Oversee governance security and compliance as per industry standards and regulatory requirements.
- Drive continuous improvement through retrospectives root-cause analysis and process enhancements.
Job Title: Technical Project Manager SRE Operations Location: Miami FL- Onsite Duration: Fulltime/ Permanent Client is seeking a highly skilled Technical Project Manager with strong expertise in Site Reliability Engineering (SRE) Automation Cloud Operations and AIOps. The ideal candidate will...
Job Title: Technical Project Manager SRE Operations
Location: Miami FL- Onsite
Duration: Fulltime/ Permanent
Client is seeking a highly skilled Technical Project Manager with strong expertise in Site Reliability Engineering (SRE) Automation Cloud Operations and AIOps. The ideal candidate will combine strong technical depth with outstanding project leadership ensuring high availability reliability and automation-driven efficiency across large-scale distributed systems.
Key Responsibilities:
Program & Project Leadership:
- Lead end to end delivery of SRE and Operations modernization projects across cloud network and platform environments.
- Manage cross functional engineering teams vendors and partners to deliver high-quality solutions on schedule.
- Drive operational transformation through automation observability and AI driven insights.
- Develop detailed project plans milestones risk logs and communication plans for technical initiatives.
Required Skills & Qualifications:
- 12 years of experience in technical project/program management with at least 4 6 years in SRE/DevOps/Operations.
- Strong understanding of cloud platforms (AWS Azure GCP) and containerized environments (Kubernetes Docker).
-
- Hands-on familiarity with automation tools:
- Terraform / Ansible / Jenkins
- Python/Go/Bash scripting
- CI/CD pipelines
-
- Deep knowledge of:
- Observability stacks (New Relic Grafana ELK Splunk Catchpoint)
- Incident & change management systems (ServiceNow Jira)
- Proven experience deploying or managing AIOps platforms
- Strong analytical and problem-solving skills with ability to lead high-impact incidents.
- Excellent communication leadership and vendor management capabilities.
SRE & Operations Management:
- Oversee reliability engineering initiatives: incident management problem management capacity planning and performance optimization.
- Ensure SLO/SLI/SLA compliance across e2e infrastructure and customer-facing platforms.
- Implement best in class practices for monitoring alerting and service resilience.
Automation & AIOps:
- Lead automation programs using scripting orchestration and Infrastructure-as-Code (IaC) techniques.
- Champion AIOps solutions for predictive analytics smart alerting anomaly detection and automated remediation.
- Partner with engineering teams to build self-healing capabilities and reduce MTTR.
Stakeholder Management:
- Serve as the primary interface between engineering operations product and leadership teams.
- Present program updates operational metrics and business impact to senior executives.
- Ensure stakeholder alignment on priorities roadmaps and technical dependencies.
- Compliance Governance & Telecom Standards:
- Oversee governance security and compliance as per industry standards and regulatory requirements.
- Drive continuous improvement through retrospectives root-cause analysis and process enhancements.
View more
View less