Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailWe are looking for a highly skilled and experienced Site Reliability Engineer (SRE) who will play a key role in transforming reliability engineering through AI-based innovationwhile bringing deep expertise in core SRE practices.
This role is not just about applying AI; its about being a hands-on SRE firstsomeone who understands real-world operational pain points and knows how to drive systemic improvements through automation observability and intelligent tooling. Youll play a key role in institutionalizing SRE best practices and routines and embedding intelligence-driven operations into the engineering culture.
Your efforts will directly contribute to unifying reliability efforts across teams enabling consistent engineering standards and fostering a shared accountability model for service health. By driving operational discipline and aligning reliability goals with business priorities you will help create a culture where platform stability developer productivity and customer experience go hand in hand. These contributions will play a vital role in supporting the organizations broader strategyenabling faster innovation scalable growth and a resilient technology foundation aligned with long-term business outcomes.
Drive strategic initiatives to transform SRE capabilities through AI/ML innovationwhile setting the vision for reliability engineering and operational excellence.
Leverage AI and machine learning technologies to architect and oversee solutions that advance the overall SRE agendaimproving reliability automation observability and operational efficiency across complex systems.
Own govern and continuously improve incident management change management and release processes to ensure highest levels of stability safety and velocity.
Lead and champion key SRE practices and routinesdriving organization-wide adoption of SRE Community of Practice (CoP) SLA/SLO alignment error budget governance and data-driven process optimization.
Guide and influence cross-functional teams including SREs platform engineers to develop reliable scalable AI/ML tools and frameworks.
Oversee engineering strategies that improve service reliability availability and performance at scale.
Define build and evangelize internal frameworks and tooling to accelerate AI/ML adoption across all reliability domains.
Lead Zero-Touch Operations initiatives and roadmap empowering platforms for autonomous issue detection and resolution.
Leverage advanced metrics telemetry and incident data analytics to inform strategic decisions and build enterprise-grade reliability dashboards.
Own on-call strategy escalation policies and incident response governance across teams.
Drive security integration across all reliability workflows leading vulnerability management compliance and collaboration with security leadership.
Shape and own the AI-in-SRE strategic visionserving as a thought leader and mentor to the entire SRE organization.
Extensive experience (5 years) as a senior SRE Platform Engineer or DevOps Engineer responsible for large-scale complex distributed systems with a strong understanding of AI/ML fundamentals and hands-on experience applying AI-powered tools.
Automation-First Mindset: Demonstrated ability to drive end-to-end automation across incident response change/release workflows observability and daily operations. Strong never do it twice manually attitude with a proven track record of eliminating toil through intelligent tooling scripting and systematic process optimization.
Expert-level programming and scripting skills (Python Go or similar) with experience designing automation at scale.
AI-Accelerated Mindset: You actively leverage modern AI tools (e.g. LLMs) to boost productivity streamline development workflows and augment traditional engineering tasksdemonstrating a willingness to adapt and innovate with evolving technologies.
Mastery of core SRE principles including SLIs/SLOs incident management governance root cause analysis scalability fault tolerance and capacity planning.
Proven leadership in incident change and release management driving automation auditability and continuous service reliability improvements.
Strategic ability to establish and evangelize reliability frameworks rituals and operational excellence aligned with enterprise-wide goals.
Deep expertise in cloud architectures (AWS GCP Azure) container ecosystems (Docker) and orchestration platforms (Kubernetes).
Advanced knowledge of observability systems and the ability to architect enterprise-grade monitoring and alerting solutions.
In-depth understanding of Linux/Unix internals performance optimization and complex OS-level production troubleshooting.
Strong grasp of networking security best practices vulnerability management and compliance requirements.
Experience influencing cross-team collaboration and mentoring junior engineers in SRE practices.
LLM-Native Development Approach: Proficiency in using LLM-powered tools for research automation or code generation. Experience building custom AI-assisted automations or tools that deliver measurable engineering efficiency gains.
Statistical Quality Verification: Hands-on experience with experimental design statistical analysis and scripting to measure the impact of system changes. Familiarity with confidence intervals significance testing and frameworks for validating probabilistic AI/ML models.
Maersk is committed to a diverse and inclusive workplace and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race colour gender sex age religion creed national origin ancestry citizenship marital status sexual orientation physical or mental disability medical condition pregnancy or parental leave veteran status gender identity genetic information or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements.
We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website apply for a position or to perform a job please contact us by emailing .
Required Experience:
Senior IC
Full-Time