Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)

Porto - Portugal

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

We are hiring a senior MLOps/DevOps/SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you havent designed production-grade MLOps infrastructure havent built CI/CD for ML or havent deployed ML workloads on Kubernetes at scale this role is not a fit.

You will design build and own the AWS-based infrastructure Kubernetes platform CI/CD pipelines and observability stack that supports our AI models (Agentic AI NLU ASR Voice Biometrics TTS). You will be the technical owner of MLOps infrastructure decisions patterns and standards.

Location: Remote - Europe (PL/ES/PT/CZ/CY)

Key Responsibilities:

MLOps Platform Architecture (from scratch)

Design and build AWS-based AI/ML infrastructure using Terraform (required).
Define standards for security automation cost efficiency and governance.
Architect infrastructure for ML workloads GPU/accelerators scaling and high availability.

Kubernetes & Model Deployment

Architect build and operate production Kubernetes clusters.
Containerize and productize ML models (Docker Helm).
Deploy latency-sensitive and high-throughput models (ASR/TTS/NLU/Agentic AI).
Ensure GPU and accelerator nodes are properly integrated and optimized.

CI/CD for Machine Learning

Build automated training validation and deployment pipelines (GitLab/Jenkins).
Implement canary blue-green and automated rollback strategies.
Integrate MLOps lifecycle tools (MLflow Kubeflow SageMaker Model Registry etc.).

Observability & Reliability

Implement full observability (Prometheus Grafana).
Own uptime performance and reliability for ML production services.
Establish monitoring for latency drift model health and infrastructure health.

Collaboration & Technical Leadership

Work closely with ML engineers researchers and data scientists.
Translate experimental models into production-ready deployments.
Define best practices for MLOps across the company.

Qualifications and Skills:

Were looking for a senior engineer with a strong DevOps/SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure automation and hands-on MLOps experience.

5 years in a Senior DevOps SRE or MLOps Engineering role supporting production environments.
Strong experience designing building and maintaining Kubernetes clusters in production.
Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
Solid programming skills in Python or Go for building automation tooling and ML workflows.
Proven experience creating and maintaining CI/CD pipelines (GitLab or Jenkins).
Practical experience deploying and supporting ML models in production (e.g. ASR TTS NLU LLM/Agentic AI).
Familiarity with ML workflow orchestration tools such as Kubeflow Apache Airflow or similar.
Experience with experiment tracking and model registry tools (e.g. MLflow SageMaker Model Registry).
Exposure to deploying models on GPU or specialized hardware (e.g. Inferentia Trainium).
Solid understanding of cloud infrastructure on AWS including networking scaling storage and security best practices.
Experience with deployment tooling (Docker Helm) and observability stacks (Prometheus Grafana).

Ways to Know Youll Succeed

You enjoy building platforms from the ground up and owning technical decisions.
Youre comfortable collaborating with ML engineers researchers and software teams to turn research into stable production systems.
You like solving performance automation and reliability challenges in distributed systems.
You bring a structured pragmatic and scalable approach to infrastructure design.
Energetic and proactive individual with a natural drive to take initiative and move things forward.
Enjoys working closely with people - researchers ML engineers cloud architects product teams.
Comfortable sharing ideas openly challenging assumptions and contributing to technical discussions.
Collaborative mindset: you like to build together not work in isolation.
Strong ownership mentality - you enjoy taking responsibility for systems end-to-end.
Curious hands-on and motivated by solving complex technical challenges.
Clear communicator who can translate technical work into practical recommendations.
Thrives in a fast-paced environment where you can experiment improve and shape how things are done.

What we offer

Competitive fixed compensation based on experience and expertise.
Work on cutting-edge AI systems used globall.
Dynamic multi-disciplinary teams engaged in digital transformation.
Remote-first work model
Long-term B2B contract
20 days paid time off
Apple gear
Training & development budget

Our Core values at TheHRchapter
Transparency: We believe in transparent and smooth recruitment processes. You will get feedback from us.

Candidate experience: Perfect blend between automated and humanized recruitment processes. Dont hesitate to ask us for feedback anytime.

Talented pool: We bring highly-skilled motivated candidates to our clients. Our candidates match their company values and management style.

Diversity and inclusion: There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences.