- Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deployments
- Drive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolution
- Implement and refine monitoring and observability frameworks using Datadog Grafana Prometheus to maintain uptime identify bottlenecks and enhance system reliability.
- Collaborate across global teamsincluding Data Engineering Product and IT Infrastructureto resolve production issues improve deployment practices and optimize system performance
- Conduct root cause analyses and contribute to blameless post-incident reviews and preventive action plans
- Collaborate with security and compliance teams to uphold operational standards and data protection practices
- Contribute to automation and continuous improvement initiatives through scripting (Python Shell) and infrastructure-as-code (Terraform Ansible) principles
- Support the data lifecycle ensuring accuracy integrity and accessibility of data pipelines and dashboards across analytics platforms
- Collaborate with Data Engineering teams to ensure data pipelines ETL processes and analytics platforms are performant reliable and production-ready
- Collaborate on capacity planning scaling and performance optimization to ensure reliability during growth and high-load scenarios
- Use operational metrics (MTTR uptime failure rate latency) to drive service reliability improvements
- Participate in Agile ceremonies within a Scrum/Kanban model aligning with delivery squads to ensure cross-functional visibility and operational excellence
Experience:
- 6 years in DataOps DevOps infrastructure operations site reliability engineering or analytics platform support.
- Intermediate SQL for data extraction transformation and diagnostics
- Strong understanding of CI/CD pipelines (Jenkins Azure DevOps Git-based version control)
- Proficiency in monitoring and observability tools (Datadog Grafana Prometheus)
- Hands-on with Python or Shell scripting for automation and diagnostics
- Familiarity with containerization (Docker Kubernetes) and cloud platforms (AWS Azure GCP). Knowledge of AWS services is a must
- Solid grasp of infrastructure-as-code concepts (Terraform Ansible)
- Proven record in incident management maintaining SLA/SLI/SLOs for critical systems and escalation handling in enterprise environments.
- Analytical Mindset: Ability to interpret system and data metrics identify trends and recommend performance improvements
- Collaboration: Strong communication skills with cross-functional global teams across technical and non-technical domains
- Agility: Comfort working in dynamic fast-paced environments maintaining composure and prioritization under pressure
Qualifications :
Must have Skills: Docker (Strong) Kubernetes (Strong) DevOps - AWS (Strong) Terraform.
Good to have: ETL Python Shell scripting.
Remote Work :
Yes
Employment Type :
Full-time
Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deploymentsDrive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolutionImplement and refine mo...
- Manage end-to-end data and infrastructure operations from writing SQL queries to CI/CD pipeline creation and optimization and VM and cloud-based deployments
- Drive incident and request management through ServiceNow ensuring SLA compliance ownership and proactive issue resolution
- Implement and refine monitoring and observability frameworks using Datadog Grafana Prometheus to maintain uptime identify bottlenecks and enhance system reliability.
- Collaborate across global teamsincluding Data Engineering Product and IT Infrastructureto resolve production issues improve deployment practices and optimize system performance
- Conduct root cause analyses and contribute to blameless post-incident reviews and preventive action plans
- Collaborate with security and compliance teams to uphold operational standards and data protection practices
- Contribute to automation and continuous improvement initiatives through scripting (Python Shell) and infrastructure-as-code (Terraform Ansible) principles
- Support the data lifecycle ensuring accuracy integrity and accessibility of data pipelines and dashboards across analytics platforms
- Collaborate with Data Engineering teams to ensure data pipelines ETL processes and analytics platforms are performant reliable and production-ready
- Collaborate on capacity planning scaling and performance optimization to ensure reliability during growth and high-load scenarios
- Use operational metrics (MTTR uptime failure rate latency) to drive service reliability improvements
- Participate in Agile ceremonies within a Scrum/Kanban model aligning with delivery squads to ensure cross-functional visibility and operational excellence
Experience:
- 6 years in DataOps DevOps infrastructure operations site reliability engineering or analytics platform support.
- Intermediate SQL for data extraction transformation and diagnostics
- Strong understanding of CI/CD pipelines (Jenkins Azure DevOps Git-based version control)
- Proficiency in monitoring and observability tools (Datadog Grafana Prometheus)
- Hands-on with Python or Shell scripting for automation and diagnostics
- Familiarity with containerization (Docker Kubernetes) and cloud platforms (AWS Azure GCP). Knowledge of AWS services is a must
- Solid grasp of infrastructure-as-code concepts (Terraform Ansible)
- Proven record in incident management maintaining SLA/SLI/SLOs for critical systems and escalation handling in enterprise environments.
- Analytical Mindset: Ability to interpret system and data metrics identify trends and recommend performance improvements
- Collaboration: Strong communication skills with cross-functional global teams across technical and non-technical domains
- Agility: Comfort working in dynamic fast-paced environments maintaining composure and prioritization under pressure
Qualifications :
Must have Skills: Docker (Strong) Kubernetes (Strong) DevOps - AWS (Strong) Terraform.
Good to have: ETL Python Shell scripting.
Remote Work :
Yes
Employment Type :
Full-time
View more
View less