Platform SRE Engineer

HelloKindred

Not Interested
Bookmark
Report This Job

profile Job Location:

Sheffield - UK

profile Monthly Salary: Not Disclosed
Posted on: 2 hours ago
Vacancies: 1 Vacancy

Job Summary

Anticipated Contract End Date/Length: November 30 2026.
Work Set Up: Hybrid (3 days per week in office)
Clearance required: BPSS

Our client in the Information Technology and Services industry is looking for a Platform / SRE Engineer to own deployment observability reliability cost control and production operations for an AI helpdesk platform. This role will support the design deployment and operational management of AI services and production environments while ensuring scalability uptime performance optimization and operational resilience across cloud-based infrastructure.

The ideal candidate will bring strong expertise in DevOps and Site Reliability Engineering practices along with experience managing cloud-native platforms CI/CD pipelines observability tooling and AI/ML production workloads within complex enterprise environments.

What you will do:

  • Build and manage CI/CD pipelines infrastructure and runtime environments for AI services.
  • Deploy and operate model-serving orchestration and application workloads.
  • Implement monitoring tracing alerting logging and operational dashboards.
  • Manage scaling activities release processes rollback mechanisms and production support operations.
  • Optimize inference cost latency uptime and overall system reliability.
  • Create runbooks operational standards and incident response processes.
  • Support infrastructure automation and platform engineering initiatives.
  • Maintain observability and monitoring solutions across production environments.
  • Support release automation secrets management and production operational processes.
  • Collaborate with engineering teams to support AI platform reliability and operational readiness.
  • Troubleshoot production issues and support system diagnostics and remediation activities.
  • Ensure platform stability scalability and performance across cloud-native environments.

Qualifications :

  • Strong experience in DevOps and Site Reliability Engineering environments.
  • Experience with Docker Kubernetes cloud platforms and Infrastructure as Code practices.
  • Strong experience with monitoring observability and operational tooling.
  • Familiarity with CI/CD pipelines release automation secrets management and production support processes.
  • Understanding of LLM deployment patterns and API-based model integrations.
  • Experience working with cloud platforms particularly AWS.
  • Experience using Jira Confluence and ServiceNow.
  • Experience supporting AI/ML workloads in production environments is preferred.
  • Experience with GPU workloads autoscaling and cost optimization is preferred.
  • Strong troubleshooting operational support and incident response capabilities.
  • Strong communication and collaboration skills within cross-functional engineering teams.

Additional Information :

All your information will be kept confidential according to EEO guidelines.

Candidates must be legally authorized to live and work in the country where the position is based without requiring employer sponsorship.

HelloKindred is committed to fair transparent and inclusive hiring practices. We assess candidates based on skills experience and role-related requirements.

We appreciate your interest in this opportunity. While we review every application carefully only candidates selected for an interview will be contacted.

HelloKindred is an equal opportunity employer. We welcome applicants of all backgrounds and do not discriminate on the basis of race colour religion sex gender identity or expression sexual orientation age national origin disability veteran status or any other protected characteristic under applicable law.


Remote Work :

No


Employment Type :

Contract

Anticipated Contract End Date/Length: November 30 2026.Work Set Up: Hybrid (3 days per week in office)Clearance required: BPSSOur client in the Information Technology and Services industry is looking for a Platform / SRE Engineer to own deployment observability reliability cost control and productio...
View more view more

About Company

Who is HelloKindred?HelloKindred are specialists in staffing marketing, creative and technology roles, offering a range of talent solutions that can be delivered on-site, remotely or hybrid.Our vision is to make work accessible and people’s lives better. We do this by disrupting tradi ... View more

View Profile View Profile