Site Reliability Engineer (SRE) Manager

Concord USA

Not Interested
Bookmark
Report This Job

profile Job Location:

Monterrey - Mexico

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Location:HybridinMonterreyMX.8daysamonthon-site.

PossibilitytogetarelocationstipendifnotcurrentlybasedinMonterrey.

Requirement:MustbelegallyauthorizedtoworkforanyMexicanemployerwithoutsponsorshipnoworinthefuture.


AboutUs
Concord isnt your typical consulting firm; were an execution focused company passionate about delivering results. Our mission is to help clients enhance customer experiences optimize operations and revolutionize product offerings through seamless integration optimization and activation of technology and data.

Our services and solutions include Digital Experience (Salesforce Headless Commerce UI/UX) Data and Analytics (Snowflake Databricks Martech Analytics) and Engineering and Application Services (Application Modernization Greenfield Apps Portal Buildout etc.).


AbouttheRole
We are seeking a strategic technically adept and hands-on SRE Manager to lead the reliability scalability and operational excellence of our production systems. This role is ideal for a leader who thrives in high-pressure environments excels at debugging complex production issues and is passionate about building and mentoring high-performing teams.

The SRE Manager will be responsible for hiring and managing a team of SREs driving incident response and postmortem processes and collaborating with multiple product teams to build and maintain robust CI/CD pipelines and deployment practices. This role demands a strong sense of ownership a deep understanding of cloud-native infrastructure and the ability to lead by example.

Business Alignment
The SRE Manager will partner with business stakeholders to ensure reliability goals support customer experience compliance and growth targets. This includes aligning SRE initiatives with broader business objectives such as revenue protection innovation and regulatory adherence.

KeyResponsibilities

  • Build and lead a high-performing Site Reliability Engineering team.
  • Create individualized development plans for SREs encourage participation in industry conferences and support certification programs.
  • Debug and resolve complex production issues ensuring minimal downtime and rapid recovery.
  • Own the incident lifecycle including coordination communication and creation of detailed postmortem documentation.
  • Implement blameless postmortems and maintain a library of runbooks for common incident types.
  • Follow up with product teams to ensure resolution and implementation of long-term fixes.
  • Partner with internal product and engineering teams to understand infrastructure needs and deliver scalable secure and reliable solutions.
  • Drive the design implementation and automation of cloud infrastructure using Azure Terraform and Kubernetes (AKS).
  • Lead the adoption and management of tools such as Argo CD Argo Workflows Azure DevOps and Octopus Deploy.
  • Architect and manage API Gateways WAFs Service Mesh and multi-cloud networking (VNets private networks).
  • Establish and enforce deployment best practices including documentation versioning rollback strategies and environment management.
  • Collaborate with product teams to build and maintain CI/CD pipelines ensuring reliable and repeatable deployments.
  • Foster a culture of ownership accountability and continuous improvement across the team.
  • Define and track key performance indicators (KPIs) for system reliability and team effectiveness.
  • Define and manage Service Level Objectives (SLOs) and error budgets for all critical services.
  • Lead the adoption of advanced observability tools for proactive reliability management.
  • Collaborate with security compliance and architecture teams through joint reviews shared dashboards and audits to ensure infrastructure meets enterprise standards.


Required Qualifications

  • 10 years of experience in infrastructure DevOps or SRE roles with 3 years in a technical leadership or management capacity.
  • Proven experience debugging and resolving production issues in large-scale systems.
  • Experience building and scaling cloud-native infrastructure on Azure.
  • Deep expertise in Kubernetes (AKS) CI/CD pipelines and Infrastructure as Code (Terraform).
  • Strong understanding of networking VNets private cloud connectivity and multi-cloud architectures.
  • Hands-on experience with Argo CD Argo Workflows Azure DevOps.
  • Demonstrated ability to hire mentor and lead engineering teams.
  • Excellent communication and stakeholder management skills.
  • Strong problem-solving mindset with a bias for action and ownership.
  • Ability to create and maintain detailed deployment documentation and lead by example in operational excellence.
  • AdvancedEnglishproficiency(C1orC2)withprovensuccesscollaboratinginglobalEnglish-speakingenvironments.
Preferred Qualifications
  • Experience supporting internal product teams or platform engineering organizations.
  • Familiarity with FinOps cost optimization and cloud governance.
  • Exposure to compliance frameworks (SOC2 ISO HIPAA).
  • Experience with service mesh technologies (Istio Linkerd).
  • Knowledge of emerging technologies such as AI/ML ops edge computing and sustainability practices.
What Success Looks Like
  • A high-performing SRE team that operates with autonomy and accountability.
  • Internal customers view the SRE team as a trusted partner in delivering reliable scalable systems.
  • Infrastructure is automated observable and resilient by design.
  • Incidents are rare well-managed and always lead to learning and improvement.
  • CI/CD pipelines are robust well-documented and consistently deliver high-quality deployments.

***

Concord is an execution partner helping organizations drive digital transformation modernization and scalable technology solutions. We deliver real results that solve real business challenges. We operate globally and are growing fast shaping the future of technology. Join a team trusted by top companies to drive strategic growth and operational excellence!


Required Experience:

Manager

Location:HybridinMonterreyMX.8daysamonthon-site.PossibilitytogetarelocationstipendifnotcurrentlybasedinMonterrey.Requirement:MustbelegallyauthorizedtoworkforanyMexicanemployerwithoutsponsorshipnoworinthefuture.AboutUsConcord isnt your typical consulting firm; were an execution focused company passio...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Concord is a technology consultancy blending style with substance to create flawless customer experiences backed by powerful analytics and underwritten by strong data foundations. With the refinement of an agency, the grit of a startup, and the experience of an institution, we create ... View more

View Profile View Profile