Staff Engineer DevOps Engineer

Any - Colombia

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deployments
Drive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolution
Implement and refine monitoring and observability frameworks using Datadog Grafana Prometheus to maintain uptime identify bottlenecks and enhance system reliability.
Collaborate across global teamsincluding Data Engineering Product and IT Infrastructureto resolve production issues improve deployment practices and optimize system performance
Conduct root cause analyses and contribute to blameless post-incident reviews and preventive action plans
Collaborate with security and compliance teams to uphold operational standards and data protection practices
Contribute to automation and continuous improvement initiatives through scripting (Python Shell) and infrastructure-as-code (Terraform Ansible) principles
Support the data lifecycle ensuring accuracy integrity and accessibility of data pipelines and dashboards across analytics platforms
Collaborate with Data Engineering teams to ensure data pipelines ETL processes and analytics platforms are performant reliable and production-ready
Collaborate on capacity planning scaling and performance optimization to ensure reliability during growth and high-load scenarios
Use operational metrics (MTTR uptime failure rate latency) to drive service reliability improvements
Participate in Agile ceremonies within a Scrum/Kanban model aligning with delivery squads to ensure cross-functional visibility and operational excellence

Experience:

6 years in DataOps DevOps infrastructure operations site reliability engineering or analytics platform support.
Intermediate SQL for data extraction transformation and diagnostics
Strong understanding of CI/CD pipelines (Jenkins Azure DevOps Git-based version control)
Proficiency in monitoring and observability tools (Datadog Grafana Prometheus)
Hands-on with Python or Shell scripting for automation and diagnostics
Familiarity with containerization (Docker Kubernetes) and cloud platforms (AWS Azure GCP). Knowledge of AWS services is a must
Solid grasp of infrastructure-as-code concepts (Terraform Ansible)

Proven record in incident management maintaining SLA/SLI/SLOs for critical systems and escalation handling in enterprise environments.
Analytical Mindset: Ability to interpret system and data metrics identify trends and recommend performance improvements
Collaboration: Strong communication skills with cross-functional global teams across technical and non-technical domains
Agility: Comfort working in dynamic fast-paced environments maintaining composure and prioritization under pressure

Qualifications :

Must have Skills: Docker (Strong) Kubernetes (Strong) DevOps - AWS (Strong) Terraform.

Good to have: ETL Python Shell scripting.

Remote Work :

Yes

Employment Type :

Full-time

Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deploymentsDrive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolutionImplement and refine mo...

Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deployments
Drive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolution
Implement and refine monitoring and observability frameworks using Datadog Grafana Prometheus to maintain uptime identify bottlenecks and enhance system reliability.
Collaborate across global teamsincluding Data Engineering Product and IT Infrastructureto resolve production issues improve deployment practices and optimize system performance
Conduct root cause analyses and contribute to blameless post-incident reviews and preventive action plans
Collaborate with security and compliance teams to uphold operational standards and data protection practices
Contribute to automation and continuous improvement initiatives through scripting (Python Shell) and infrastructure-as-code (Terraform Ansible) principles
Support the data lifecycle ensuring accuracy integrity and accessibility of data pipelines and dashboards across analytics platforms
Collaborate with Data Engineering teams to ensure data pipelines ETL processes and analytics platforms are performant reliable and production-ready
Collaborate on capacity planning scaling and performance optimization to ensure reliability during growth and high-load scenarios
Use operational metrics (MTTR uptime failure rate latency) to drive service reliability improvements
Participate in Agile ceremonies within a Scrum/Kanban model aligning with delivery squads to ensure cross-functional visibility and operational excellence

Experience:

6 years in DataOps DevOps infrastructure operations site reliability engineering or analytics platform support.
Intermediate SQL for data extraction transformation and diagnostics
Strong understanding of CI/CD pipelines (Jenkins Azure DevOps Git-based version control)
Proficiency in monitoring and observability tools (Datadog Grafana Prometheus)
Hands-on with Python or Shell scripting for automation and diagnostics
Familiarity with containerization (Docker Kubernetes) and cloud platforms (AWS Azure GCP). Knowledge of AWS services is a must
Solid grasp of infrastructure-as-code concepts (Terraform Ansible)

Proven record in incident management maintaining SLA/SLI/SLOs for critical systems and escalation handling in enterprise environments.
Analytical Mindset: Ability to interpret system and data metrics identify trends and recommend performance improvements
Collaboration: Strong communication skills with cross-functional global teams across technical and non-technical domains
Agility: Comfort working in dynamic fast-paced environments maintaining composure and prioritization under pressure

Qualifications :

Must have Skills: Docker (Strong) Kubernetes (Strong) DevOps - AWS (Strong) Terraform.

Good to have: ETL Python Shell scripting.

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Computer Science
Docker
Kubernetes
Python
VMware
C/C++
Go
System Architecture
gRPC
OS Kernels
Perl
Distributed Systems

Apply Now

About Company

Nagarro

Nagarro helps future-proof your business through a forward-thinking, fluidic, and CARING mindset. We excel at digital engineering and help our clients become human-centric, digital-first organizations, augmenting their ability to be responsive, efficient, intimate, creative, and susta ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click