DevOps & SRE Lead
Job Summary
Role purpose
- The DevOps & SRE Lead is responsible for ensuring the reliability scalability security and operational excellence of enterprise data and AI platforms and applications.
- This role combines handson technical leadership with site reliability practices enabling highquality delivery through automation observability and strong operational governance.
- The role leads DevOps and SRE practices across platforms works closely with data engineering teams AI/ML and product teams and establishes standards that enable teams to build and run reliable systems at scale.
Knowledge experience & capabilities
DevOps & Platform Engineering
- Lead the design implementation and evolution of CI/CD pipelines.
- Define and enforce DevOps standards tooling and best practices.
- Drive InfrastructureasCode and environment consistency across QA staging and production.
- Partner with application data and AI teams to embed DevOps practices early in development.
Site Reliability Engineering
- Own platform reliability availability performance and scalability.
- Define and monitor SLOs SLIs error budgets and reliability KPIs.
- Lead incident response root cause analysis and postincident reviews.
- Drive proactive reliability engineering through automation and observability.
Cloud & Infrastructure
- Own cloud platform operations (AWS preferred).
- Ensure secure costefficient and resilient cloud infrastructure.
- Drive platform upgrades patching and lifecycle management.
- Collaborate with security teams on IAM network security and compliance.
Observability & Operations
- Implement monitoring logging alerting and tracing frameworks.
- Ensure high signaltonoise operational alerts.
- Continuously improve MTTR system stability and operational maturity.
Leadership & Governance
- Provide technical leadership and mentoring to DevOps/SRE engineers.
- Define operating models oncall processes and support structures.
- Work with Product Owners and Architects on roadmap planning.
- Act as escalation point for platform and reliability issues.
Skills
- CI/CD: GitHub GitLab Azure DevOps Jenkins
- Cloud: AWS / Azure / GCP (strong in at least one)
- Infrastructure as Code: Terraform CloudFormation ARM
- Containers & Orchestration: Docker Kubernetes
- Observability: CloudWatch Prometheus Grafana Datadog
- Scripting: Python Bash (or equivalent)
Critical success factors & key challenges
- Platform uptime and reliability metrics (SLOs).
- Domain team adoption rate of the self-serve platform.
- Mean Time to Detection (MTTD) and Recovery (MTTR) for platform incidents.
- Strong ownership mindset
- Excellent collaboration and communication skills
- Ability to balance speed stability and governance
Qualifications :
Qualification & Experience
- grad
- 1012 years in DevOps SRE or Platform Engineering roles
- 3 years in a technical lead or senior ownership role
- Experience supporting productioncritical platforms at scale
Additional Information :
Innovations
Employee may as part of his/her role and maybe through multifunctional teams participate in the creation and design of innovative this context Employee may contribute to inventions designs other work product including know-how copyrights software innovations solutions and other intellectual assets.
Remote Work :
No
Employment Type :
Full-time
About Company
To help feed 10 billion people while reducing emissions and improve biodiversity. This is our mission as the global agriculture technology leader. With 59,000 employees in more than 100 countries and hundreds of thousands of agricultural partners worldwide, we are committed to transfo ... View more