Senior Site Reliability Engineer

Datavail Infotech


Job Location:

Bogotá - Colombia

Monthly Salary: Not Disclosed
Posted on: 6 days ago
Vacancies: 1 Vacancy

Job Summary

Description
  • Define and maintain SLIs/SLOs monitor alignment and error budget usage
  • Lead incident response and postmortems implement corrective measures
  • Automate operations tasks via tooling (e.g. auto-remediation scaling rules)
  • Build improve and maintain CI/CD pipelines canary deployments blue/green strategies
  • Lead technical discussions with customers to align on reliability scalability and performance requirements
  • Drive continuous platform improvements across the service lifecycle including architecture monitoring and operational processes
    Implement and extend observability systems (metrics tracing log aggregation)
  • Optimize performance and cost by tuning cloud services autoscaling resource rightsizing
  • Design deploy and operate containerized workloads using Docker and Kubernetes in production environments
  • Collaborate with dev teams to integrate resilience patterns (circuit breakers bulkheading)
  • Participate in architecture discussions around high availability disaster recovery
  • Mentor mid and junior SREs; conduct reliability design reviews
  • 58 years of experience in a reliability or operations role
  • Cloud-agnostic certification: Terraform Associate Certified Kubernetes Administrator (CKA) or SRE Foundation
  • Cloud provider certification: Professional-level certification in AWS (Solutions Architect) Azure (Solutions Architect Expert) GCP (Professional Cloud Architect) or Oracle Cloud (Architect Professional)
  • Solid coding skills (Python Go or equivalent)
  • Experience with IaC CI/CD pipelines and monitoring/observability stacks (Prometheus Grafana OpenTelemetry ELK)
  • Comfortable with observability stacks (Prometheus Grafana OpenTelemetry ELK Jaeger)
  • Experience working in distributed systems and production scale services

Nice-to-have Skills

  • Exposure to multi-cloud data replication or cross-cloud networks
  • Experience with chaos engineering or fault injection



Required Experience:

Senior IC

DescriptionDefine and maintain SLIs/SLOs monitor alignment and error budget usageLead incident response and postmortems implement corrective measuresAutomate operations tasks via tooling (e.g. auto-remediation scaling rules)Build improve and maintain CI/CD pipelines canary deployments blue/green str...

About Company

Company Logo

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leadi ... View more

View Profile View Profile