We are looking for a Senior SRE to join our core engineering team in building the next generation of AI-powered property intelligence for the insurance this role you will be the guardian of a platforms availability latency and performance.
You will work at the heart of a high-demand ecosystem ensuring that our microservices and AI/ML pipelines running on Google Cloud Platform (GCP) are resilient scalable and secure. This is a Software Engineering approach to Operations role where automation is the default and manual intervention is a last resort.
Key Responsibilities
Infrastructure & Platform Engineering
Cloud Architecture: Design and manage scalable multi-regional infrastructure on GCP leveraging GKE (Kubernetes) Cloud Run and Pub/Sub.
Infrastructure as Code (IaC): Maintain and evolve our infrastructure codebase using Terraform or Pulumi ensuring environment parity across Staging and Production.
Optimization: Partner with Fullstack teams to tune application performance managing memory limits event loop bottlenecks and asynchronous execution in a containerized environment.
Observability & Reliability
SLO/SLI Definition: Define and monitor Service Level Indicators (SLIs) and Objectives (SLOs) to measure the health of our property intelligence engine.
Advanced Monitoring: Build comprehensive dashboards and alerting systems using Google Cloud Operations Suite (Stackdriver) Prometheus or Grafana.
Incident Management: Lead Root Cause Analysis (RCA) for production incidents and implement Blameless Post-mortems to prevent recurrence.
AI & Data Operations
Security & Compliance: Ensure the platform meets the rigorous data privacy standards of the insurance industry including SOC2 and GDPR compliance.
Qualifications :
Technical Requirements:
5 years in an SRE DevOps or System Architecture role.
GCP Expertise: Deep experience with Google Cloud Platform specifically GKE IAM Cloud SQL and VPC networking.
Coding Proficiency: Strong experience with (backend services) and scripting in Python or Go for automation.
Orchestration: Expert-level knowledge of Kubernetes (GKE) including Helm charts and service meshes (Istio/Anthos).
CI/CD: Experience building high-frequency deployment pipelines with GitHub Actions GitLab CI or Google Cloud Build.
Professional Competencies:
The SRE Mindset: A passion for automation and a visceral dislike of repetitive manual tasks (Toil).
Strategic Communication: Ability to translate complex infrastructure risks into business impact for Stakeholders and Delivery Directors.
AI-First Workflow: Proactive use of AI tools for log anomaly detection predictive scaling and automated troubleshooting.
Additional Information :
Location: Guadalajara Jalisco Mexico (Hybrid)
Benefits and Perks
Perks you enjoy at KMS Mexico
- Mexican law benefits
- 15 days of PTO (in year zero from the first year onwards it is 3 days per year).
- 5 days leave for the death of immediate family members negotiable.
- Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children).
- Annual performance bonus (10% of annualized salary).
- Annual salary adjustment.
- Employee Referral Bonus.
- Paid Certifications / Courses
- Coursera License.
- 5% Savings Fund.
- 5% Grocery Vouchers.
Remote Work :
No
Employment Type :
Full-time
We are looking for a Senior SRE to join our core engineering team in building the next generation of AI-powered property intelligence for the insurance this role you will be the guardian of a platforms availability latency and performance.You will work at the heart of a high-demand ecosystem ensuri...
We are looking for a Senior SRE to join our core engineering team in building the next generation of AI-powered property intelligence for the insurance this role you will be the guardian of a platforms availability latency and performance.
You will work at the heart of a high-demand ecosystem ensuring that our microservices and AI/ML pipelines running on Google Cloud Platform (GCP) are resilient scalable and secure. This is a Software Engineering approach to Operations role where automation is the default and manual intervention is a last resort.
Key Responsibilities
Infrastructure & Platform Engineering
Cloud Architecture: Design and manage scalable multi-regional infrastructure on GCP leveraging GKE (Kubernetes) Cloud Run and Pub/Sub.
Infrastructure as Code (IaC): Maintain and evolve our infrastructure codebase using Terraform or Pulumi ensuring environment parity across Staging and Production.
Optimization: Partner with Fullstack teams to tune application performance managing memory limits event loop bottlenecks and asynchronous execution in a containerized environment.
Observability & Reliability
SLO/SLI Definition: Define and monitor Service Level Indicators (SLIs) and Objectives (SLOs) to measure the health of our property intelligence engine.
Advanced Monitoring: Build comprehensive dashboards and alerting systems using Google Cloud Operations Suite (Stackdriver) Prometheus or Grafana.
Incident Management: Lead Root Cause Analysis (RCA) for production incidents and implement Blameless Post-mortems to prevent recurrence.
AI & Data Operations
Security & Compliance: Ensure the platform meets the rigorous data privacy standards of the insurance industry including SOC2 and GDPR compliance.
Qualifications :
Technical Requirements:
5 years in an SRE DevOps or System Architecture role.
GCP Expertise: Deep experience with Google Cloud Platform specifically GKE IAM Cloud SQL and VPC networking.
Coding Proficiency: Strong experience with (backend services) and scripting in Python or Go for automation.
Orchestration: Expert-level knowledge of Kubernetes (GKE) including Helm charts and service meshes (Istio/Anthos).
CI/CD: Experience building high-frequency deployment pipelines with GitHub Actions GitLab CI or Google Cloud Build.
Professional Competencies:
The SRE Mindset: A passion for automation and a visceral dislike of repetitive manual tasks (Toil).
Strategic Communication: Ability to translate complex infrastructure risks into business impact for Stakeholders and Delivery Directors.
AI-First Workflow: Proactive use of AI tools for log anomaly detection predictive scaling and automated troubleshooting.
Additional Information :
Location: Guadalajara Jalisco Mexico (Hybrid)
Benefits and Perks
Perks you enjoy at KMS Mexico
- Mexican law benefits
- 15 days of PTO (in year zero from the first year onwards it is 3 days per year).
- 5 days leave for the death of immediate family members negotiable.
- Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children).
- Annual performance bonus (10% of annualized salary).
- Annual salary adjustment.
- Employee Referral Bonus.
- Paid Certifications / Courses
- Coursera License.
- 5% Savings Fund.
- 5% Grocery Vouchers.
Remote Work :
No
Employment Type :
Full-time
View more
View less