Senior ML Ops LLM Ops Engineer
Job Summary
Overview:
This role focuses onbuilding and operating the ML Ops / LLM Ops pipeline that closes it: ingest production signal redact it store it slice it classify it surface the failures mine new eval cases and alert on regressions. You drive the toolchain decisions the data-governance posture and the day-to-day reliability of the pipeline itself. The Head of AI sets vision and priorities and you own the technical execution end-to-end.
What will you do
- Design and build a source-agnostic ingestion pipeline for production ML / LLM traffic
- Design storage tiering based on automotive and company requirements policy-driven retention windows and privacy requirements
- Build slicing dashboards and the query path engineers use to debug production at 11p.m.
- Enable autoraters and lightweight LLM classifiers across production traffic
- Build the rule-based triage layer for obvious failures
- Stand up the eval-mining workflow and wire regression alerts to model and prompt deploys
- Implement PII redaction at the ingestion boundary and safety / abuse classification on inbound content
- Define dashboard architecture wipeout mechanisms tool and hosting selection and operate the pipeline end-to-end
What are we looking for
Must Have
- Proven experience building and operating data or ML platform systems in production covering ingest schema storage access control alerts and on-call
- Hands-on experience building and running ML / LLM evaluation systems in production (offline regression sets online autoraters LLM-as-judge pipelines golden datasets)
- Hands-on experience with LLM tracing and observability tooling
- Experience shipping PII redaction or comparable data-handling controls in a regulated or multi-tenant environment with a pragmatic approach to data governance
- Strong understanding of how ML and LLM-based systems fail in production: hallucination retrieval failures agent loops that dont terminate ASR / TTS degradation and prompt or model regressions across deploys
- Production Python proficiency; hands-on engineer not advisory. Comfortable leveraging AI in everything you build
Nice to Have
- Preferable multi-tenant or white-label SaaS experience with per-tenant data isolation
- Azure experience and ability to make self-host vs managed SaaS calls on tradeoffs
- Experience with autorater methodology and contamination defenses
- Knowledge of vector databases embedding-based clustering or unsupervised failure-mode discovery
- Experience with data-versioning tooling (LakeFS DVC Delta Lake)
- GDPR / right-to-erasure work
- Embedded automotive or another constrained environment context
- Working knowledge of a language beyond English sufficient to validate non-English failure modes
- Prior experience using Cloud (Microsoft Azure and AWS);
- Prior experience with Claude Code;
- Prior experience with GitHub;
- Languages: Python primary SQL and some TypeScript for dashboards;
- LLM APIs: Claude (Anthropic) OpenAI open-source models as needed
- Android/AAOS ecosystem as clients
What can you expect from us
- A permanent job contract for a long term project;
- Tech equipment SIM Card personal smartphone;
- Health and Life Insurance;
- Social events and team buildings;
- The commitment of letting you grow with us and be rewarded accordingly;
- A dynamic and young team that will be always there to support you;
- Training in the latest technologies;
- Coffee fruits snacks and a warm welcoming when you pass by the office.
Required Experience:
Senior IC
About Company
At Caixa Mágica, we provide custom software development that drives business growth. Explore our services and see how we help companies.