Product Manager AI Evaluation & Developer Platforms

Techifide


Job Location:

São Paulo - Brazil

Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

About the Opportunity

Were partnering with a fast-growing AI company building advanced enterprise-grade artificial intelligence solutions. Their products help organisations transform complex information into actionable insights through cutting-edge machine learning agentic systems and intelligent automation.

As the company continues to scale they are seeking an experienced Product Manager to lead the strategy and execution of their AI Evaluation and Developer Tooling ecosystem.

This is a highly technical role at the intersection of Product Management Machine Learning AI Quality Assurance and Developer Experience.

The Role

As Product Manager for AI Evaluation & Developer Platforms you will own the systems that measure validate and improve AI performance across the organisation.

Youll define how AI quality is assessed how model behaviour is benchmarked and how internal teams can rapidly identify issues compare results and ship improvements with confidence.

Working closely with engineering data science research and subject matter experts youll shape the tools and frameworks that enable continuous improvement across AI products and services.

This position requires someone who is comfortable discussing evaluation metrics one moment and developer workflows the next.

Key Areas of Ownership

Data Quality & Ingestion Evaluation

Develop frameworks that assess the accuracy completeness and reliability of incoming data.

Youll help identify:

  • Data extraction issues
  • Parsing and transformation errors
  • Schema inconsistencies
  • Data loss across pipelines
  • Quality degradation before it impacts downstream systems

Agent Performance Evaluation

Design methods for measuring how AI agents plan reason and complete multi-step tasks.

Areas of focus include:

  • Task completion success
  • Reasoning quality
  • Decision-making robustness
  • Workflow execution
  • Agent reliability under changing conditions

Tool Usage & Execution Assessment

Create evaluation frameworks that verify whether AI systems:

  • Select appropriate tools
  • Pass correct parameters
  • Interpret responses accurately
  • Recover from failures gracefully
  • Execute workflows reliably

Responsibilities

  • Own the roadmap for AI evaluation frameworks and internal developer tooling.
  • Define quality standards benchmarks scoring methodologies and success metrics.
  • Create detailed product requirements acceptance criteria user stories and functional specifications.
  • Partner closely with engineering teams throughout delivery cycles.
  • Build systems that enable teams to create datasets run experiments analyse results and compare model performance.
  • Develop workflows that incorporate expert review annotation and human feedback.
  • Track adoption effectiveness and business impact of evaluation tools.
  • Define and monitor KPIs and OKRs for quality coverage usability and platform performance.
  • Collaborate with design teams to deliver intuitive developer experiences and evaluation dashboards.
  • Present findings and recommendations to technical and executive stakeholders.
  • Ensure traceability from business requirements through to delivered capabilities.

Requirements

Essential

  • 7 years of Product Management experience within highly technical environments.
  • Previous experience as an ML Engineer AI Engineer Applied Scientist or equivalent hands-on AI role.
  • Deep understanding of AI evaluation methodologies and benchmarking techniques.
  • Strong knowledge of Large Language Models (LLMs) AI agents retrieval systems and modern machine learning workflows.
  • Experience defining metrics automated testing strategies and quality assurance processes for AI products.
  • Familiarity with agent architectures tool calling API integrations and multi-step reasoning systems.
  • Experience building products for technical users such as engineers researchers analysts or data scientists.
  • Ability to write clear testable and actionable product requirements.
  • Strong stakeholder management and communication skills.
  • Experience working within Agile product development environments.

Desirable

  • Experience building evaluation frameworks for LLM RAG or agent-based applications.
  • Knowledge of data ingestion ETL data quality monitoring or data governance.
  • Experience with annotation platforms human feedback systems or labelling workflows.
  • Exposure to regulated or domain-specific industries such as government healthcare legal or financial services.

Why This Role

This is a rare opportunity to help define how AI quality is measured at scale.

Youll influence the evaluation standards developer tooling and decision-making processes that underpin next-generation AI systems while working alongside highly technical teams solving complex real-world challenges.

About the Opportunity Were partnering with a fast-growing AI company building advanced enterprise-grade artificial intelligence solutions. Their products help organisations transform complex information into actionable insights through cutting-edge machine learning agentic systems and intelligent au...