Lead Site Reliability Engineer

KMS Technology

Not Interested
Bookmark
Report This Job

profile Job Location:

Guadalajara - Mexico

profile Monthly Salary: Not Disclosed
Posted on: 7 hours ago
Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

We are seeking a Lead Site Reliability Engineer to spearhead the reliability scalability and performance of our AI-powered property intelligence platform. Operating at the intersection of Geospatial AI and Insurance Technology you will be responsible for a mission-critical Azure ecosystem supporting high-throughput Java microservices.

As a Lead you will bridge the gap between complex AI model inference and enterprise-grade stability. You will own the Production Excellence mandate mentoring a team of engineers and collaborating with Senior Delivery Directors to ensure our global infrastructure stays ahead of our rapid growth.

Key Responsibilities

Strategic Infrastructure & Azure Leadership

  • Cloud Architecture: Lead the design of highly available multi-region architectures on Azure utilizing AKS (Azure Kubernetes Service) Azure Functions and Service Bus.

  • IaC Governance: Establish and enforce standards for Infrastructure as Code using Terraform or Bicep ensuring 100% automated provisioning across all environments.

  • Java Performance Engineering: Partner with Backend squads to optimize JVM performance garbage collection tuning and memory management for high-concurrency insurance processing.

Reliability & AI Operations (AIOps)

  • Error Budgeting: Define negotiate and manage SLIs SLOs and SLAs with Product Stakeholders balancing the velocity of AI feature releases with system stability.

  • Advanced Observability: Architect end-to-end monitoring and distributed tracing using Azure Monitor Application Insights and ELK/Grafana.

  • Incident Commander: Act as the ultimate escalation point for high-priority incidents leading complex Root Cause Analysis (RCA) and driving long-term remediation tasks.

Security & Industry Compliance

  • Data Sovereignty: Ensure the platform adheres to insurance-specific data residency requirements and security frameworks (SOC2 HIPAA or ISO 27001).

  • Automated Governance: Implement Azure Policy and automated security scanning within CI/CD pipelines to ensure a Secure by Design infrastructure.

 


Qualifications :

Technical Leadership:

  • 7 years in SRE DevOps or Cloud Engineering with at least 2 years in a Lead or Principal capacity.

  • Azure Mastery: Expert-level knowledge of the Azure Well-Architected Framework specifically around networking (VNet/ExpressRoute) and Compute.

  • Java Ecosystem: Deep proficiency in the Java/Spring Boot stack from an operational perspective (JVM profiling thread dump analysis).

  • Container Orchestration: Mastery of Kubernetes (AKS) including ingress controllers service mesh (Istio) and cluster security.

Professional Competencies:

  • Strategic Mindset: Ability to translate technical debt and reliability risks into a data-driven business case for leadership.

  • Automation Advocate: Proven track record of eliminating Toil through Python Go or Java-based automation tooling.

  • Mentorship: Passion for leveling up the engineering organization through workshops documentation and pair programming.

  • AI-First Integration: Experience leveraging AI for predictive scaling and automated log summarization to reduce Mean Time to Recovery (MTTR).


Additional Information :

Perks you enjoy at KMS Mexico

  • Mexican law benefits
  • 15 days of PTO (in year zero from the first year onwards it is 3 days per year).
  • 5 days leave for the death of immediate family members negotiable.
  • Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children).
  • Annual performance bonus (10% of annualized salary).
  • Annual salary adjustment.
  • Employee Referral Bonus.
  • Paid Certifications / Courses
  • Coursera License.
  • 5% Savings Fund.
  • 5% Grocery Vouchers.

Remote Work :

No


Employment Type :

Full-time

We are seeking a Lead Site Reliability Engineer to spearhead the reliability scalability and performance of our AI-powered property intelligence platform. Operating at the intersection of Geospatial AI and Insurance Technology you will be responsible for a mission-critical Azure ecosystem supporting...
View more view more

About Company

Company Logo

KMS Technology was established in 2009 as a U.S.-based software services company. With development centers in Vietnam and Mexico, we have been trusted globally for the superlative quality of our software consulting & development services, technology solutions, and engineers' expertise ... View more

View Profile View Profile