Our Client is a long-established and highly-regarded Central and Eastern European technology provider specialising in designing building and operating advanced mission-critical software solutions for large international organisations. With over 25 years of history and a team of 600 top-tier software engineers across Europe they are a certified partner for high-stakes industries including Aerospace & Defence Automotive Telecommunications and Financial Services. They hold rigorous quality and security certifications such as NATO AQAP and TISAX.
This specific role involves working with a leading European client to develop and maintain their sophisticated cloud platform. You will be responsible for defining the DevOps architecture and strategy for solutions built primarily on Microsoft Azure and Kubernetes. You will be a key player in ensuring that highly available scalable and secure cloud-native architectures are implemented managing the full lifecycle from Infrastructure as Code (IaC) development through to CI/CD and observability. This is a chance to apply your senior expertise to complex enterprise-grade cloud environments.
Responsibilities
As a Senior DevOps Engineer specialising in AI and Data you will be key to architecting and maintaining the infrastructure that supports large-scale AI/ML workloads. Your main responsibilities will be:
- Platform Deployment and Management: Design build and maintain scalable and secure cloud infrastructure for AI and Big Data solutions ensuring high availability and performance in a production environment.
- AI/ML Infrastructure Automation: Implement end-to-end automation for the complete Machine Learning (ML) lifecycle including infrastructure for model training real-time inference and data processing.
- Infrastructure as Code (IaC): Develop manage and optimise Infrastructure as Code (IaC) using tools like Terraform to provision and configure cloud resources consistently and efficiently.
- CI/CD Pipeline Development: Create manage and improve robust Continuous Integration and Continuous Delivery (CI/CD) pipelines to enable rapid and safe deployment of code models and infrastructure changes.
- Kubernetes for AI/Data: Manage and optimise Kubernetes clusters (e.g. AKS EKS or GKE) specifically tailored for running containerised data processing frameworks and GPU-enabled ML workloads at scale.
- Monitoring and Observability: Establish advanced monitoring logging and alerting systems to ensure the health performance and cost efficiency of the data and AI platforms.
- Security and Governance: Implement security measures access controls and compliance frameworks to ensure data privacy model governance and adherence to security best practices across all environments.
- Cross-Functional Collaboration: Work closely with Data Scientists AI Engineers and Software Development teams to translate complex AI requirements into robust deployable infrastructure solutions.
Requirements
- Proven DevOps Experience: Strong professional background (5 years) in a DevOps SRE or Cloud Engineering role focusing on mission-critical production systems.
- Cloud Platform Expertise: Extensive hands-on experience with at least one major cloud provider (AWS Azure or GCP) particularly related to infrastructure and networking.
- Kubernetes for Production: Expert-level knowledge of Kubernetes administration operation and optimisation for high-performance workloads including scaling and resource management.
- Infrastructure as Code (IaC): Deep practical experience with Terraform for infrastructure provisioning and state management.
- CI/CD Systems: Proficiency in designing and managing automated CI/CD pipelines (e.g. Azure DevOps GitHub Actions or GitLab CI).
- Data/ML Tooling: Experience with tools and frameworks commonly used in the Data/AI domain such as Apache Spark Apache Airflow Databricks or specific AI/ML tooling infrastructure.
- Scripting: Solid proficiency in scripting languages like Python or Bash for automation testing and system integration.
- Monitoring: Experience implementing and working with observability stacks (e.g. Prometheus Grafana ELK stack or cloud-native monitoring).
Our Client Offers
Our Client is committed to supporting your professional journey with a competitive package designed for growth flexibility and well-being.
- 100% Remote Work Option within Europe providing ultimate work-life flexibility.
- Flexible Working Hours and a hybrid work model supported by modern office hubs across Europe.
- Personalised Development Program and access to dedicated mentorship and coaching from domain experts to accelerate your career growth.
- Comprehensive Benefits Package including additional health insurance and food vouchers.
- The chance to take full ownership and make a tangible impact on mission-critical cloud-native solutions.
- A supportive collaborative team culture with regular social events and team-building activities.
Do you need more information Contact us directly
Visit website to see more Open Jobs