Role: SRE
Location: Hove UK
Is it Permanent / Contract: Open for both Perm/Contract
Is it Onsite/Remote/Hybrid: 2days per week from office
No. of Positions: 1
The successful candidate will work closely with engineering architecture and product teams to implement modern reliability practices automate operational workflows and establish robust monitoring and incident management frameworks.
Key Responsibilities
Collaborate with engineering teams to modernize IT operations by improving observability automation and operational efficiency.
Design and implement observability platforms to effectively monitor system health performance and reliability.
Develop strategies for AI-driven alerting and proactive anomaly detection to reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
Establish and enforce SRE best practices including Service Level Indicators (SLIs) Service Level Objectives (SLOs) and Error Budgets.
Define and implement an AIOps roadmap to enhance operational intelligence and automation.
Automate repetitive operational tasks (toil reduction) using scripting orchestration tools and automation frameworks.
Implement self-healing systems and automated incident response mechanisms to support autonomous operations.
Collaborate with cross-functional teams to ensure systems are scalable resilient and maintainable.
Lead incident management root cause analysis and post-incident improvement initiatives.
Promote shift-left reliability practices across engineering and product teams.
Mentor team members and advocate for a culture of reliability automation and continuous improvement.
Required Skills & Experience
Strong expertise in Site Reliability Engineering (SRE) principles and practices.
Hands-on experience implementing observability solutions particularly with Dynatrace and Datadog.
Strong scripting and automation experience using Python and Ansible.
Experience working with cloud platforms such as AWS and Azure.
Solid understanding of containerization and orchestration technologies including Docker and Kubernetes.
Experience working with cloud-native distributed systems and microservices architectures.
IT Services and IT Consulting