Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)

TheHRchapter

Not Interested
Bookmark
Report This Job

profile Job Location:

Porto - Portugal

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

We are hiring a senior MLOps/DevOps/SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you havent designed production-grade MLOps infrastructure havent built CI/CD for ML or havent deployed ML workloads on Kubernetes at scale this role is not a fit.

You will design build and own the AWS-based infrastructure Kubernetes platform CI/CD pipelines and observability stack that supports our AI models (Agentic AI NLU ASR Voice Biometrics TTS). You will be the technical owner of MLOps infrastructure decisions patterns and standards.

Location: Remote - Europe (PL/ES/PT/CZ/CY)

Key Responsibilities:

MLOps Platform Architecture (from scratch)

  • Design and build AWS-based AI/ML infrastructure using Terraform (required).
  • Define standards for security automation cost efficiency and governance.
  • Architect infrastructure for ML workloads GPU/accelerators scaling and high availability.

Kubernetes & Model Deployment

  • Architect build and operate production Kubernetes clusters.
  • Containerize and productize ML models (Docker Helm).
  • Deploy latency-sensitive and high-throughput models (ASR/TTS/NLU/Agentic AI).
  • Ensure GPU and accelerator nodes are properly integrated and optimized.

CI/CD for Machine Learning

  • Build automated training validation and deployment pipelines (GitLab/Jenkins).
  • Implement canary blue-green and automated rollback strategies.
  • Integrate MLOps lifecycle tools (MLflow Kubeflow SageMaker Model Registry etc.).

Observability & Reliability

  • Implement full observability (Prometheus Grafana).
  • Own uptime performance and reliability for ML production services.
  • Establish monitoring for latency drift model health and infrastructure health.

Collaboration & Technical Leadership

  • Work closely with ML engineers researchers and data scientists.
  • Translate experimental models into production-ready deployments.
  • Define best practices for MLOps across the company.


    Qualifications and Skills:

    Were looking for a senior engineer with a strong DevOps/SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure automation and hands-on MLOps experience.

    • 5 years in a Senior DevOps SRE or MLOps Engineering role supporting production environments.
    • Strong experience designing building and maintaining Kubernetes clusters in production.
    • Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
    • Solid programming skills in Python or Go for building automation tooling and ML workflows.
    • Proven experience creating and maintaining CI/CD pipelines (GitLab or Jenkins).
    • Practical experience deploying and supporting ML models in production (e.g. ASR TTS NLU LLM/Agentic AI).
    • Familiarity with ML workflow orchestration tools such as Kubeflow Apache Airflow or similar.
    • Experience with experiment tracking and model registry tools (e.g. MLflow SageMaker Model Registry).
    • Exposure to deploying models on GPU or specialized hardware (e.g. Inferentia Trainium).
    • Solid understanding of cloud infrastructure on AWS including networking scaling storage and security best practices.
    • Experience with deployment tooling (Docker Helm) and observability stacks (Prometheus Grafana).

    Ways to Know Youll Succeed
    • You enjoy building platforms from the ground up and owning technical decisions.
    • Youre comfortable collaborating with ML engineers researchers and software teams to turn research into stable production systems.
    • You like solving performance automation and reliability challenges in distributed systems.
    • You bring a structured pragmatic and scalable approach to infrastructure design.
    • Energetic and proactive individual with a natural drive to take initiative and move things forward.
    • Enjoys working closely with people - researchers ML engineers cloud architects product teams.
    • Comfortable sharing ideas openly challenging assumptions and contributing to technical discussions.
    • Collaborative mindset: you like to build together not work in isolation.
    • Strong ownership mentality - you enjoy taking responsibility for systems end-to-end.
    • Curious hands-on and motivated by solving complex technical challenges.
    • Clear communicator who can translate technical work into practical recommendations.
    • Thrives in a fast-paced environment where you can experiment improve and shape how things are done.


    What we offer

    • Competitive fixed compensation based on experience and expertise.
    • Work on cutting-edge AI systems used globall.
    • Dynamic multi-disciplinary teams engaged in digital transformation.
    • Remote-first work model
    • Long-term B2B contract
    • 20 days paid time off
    • Apple gear
    • Training & development budget



      Our Core values at TheHRchapter
      Transparency: We believe in transparent and smooth recruitment processes. You will get feedback from us.

      Candidate experience: Perfect blend between automated and humanized recruitment processes. Dont hesitate to ask us for feedback anytime.

      Talented pool: We bring highly-skilled motivated candidates to our clients. Our candidates match their company values and management style.

      Diversity and inclusion: There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences.

      We are hiring a senior MLOps/DevOps/SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you havent designed production-grade MLOps infrastructure havent built CI/CD for ML or havent deployed ML workloads on Kub...
      View more view more

      Key Skills

      • Apache Hive
      • S3
      • Redshift
      • Spark
      • AWS
      • Solr
      • NoSQL
      • Data Warehouse
      • Internet Of Things
      • Kafka
      • DynamoDB
      • ZooKeeper

      About Company

      Company Logo

      Your Strategic Partner for HR, Payroll & Headhunting Solutions

      View Profile View Profile