Functional AI Tester GenAI

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Functional AI Tester - GenAI

About the Role

You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG) conversational AI and Agentic evaluations. The role centers on:

Systematic GenAI evaluation (qualitative and quantitative metrics)
ETL and data quality testing for the data flows that feed AI systems
Python-driven automated testing

This position is hands-on and collaborative partnering with AI engineers data engineers and product teams to define measurable acceptance criteria and ship high-quality AI features.

Key Responsibilities

Test strategy and planning
- Define risk-based test strategies and detailed test plans for GenAI features.
- Establish clear acceptance criteria with stakeholders for functional safety and data quality aspects.
Python test automation
- Build and maintain automated test suites using Python (e.g. PyTest requests).
- Implement reusable utilities for prompt/response validation dataset management and result scoring.
- Create regression baselines and golden test sets to detect quality drift.
GenAI evaluation
- Develop evaluation harnesses covering factuality coherence helpfulness safety bias and toxicity etc.
- Design prompt suites scenario-based tests and golden datasets for reproducible measurements.
- Implement guardrail tests including prompt-injection resilience unsafe content detection and PII redaction checks.
- Track quality metrics over time.
RAG and semantic retrieval testing
- Verify alignment between retrieved sources and generated answers.
- Verify adversarial tests.
- Measure retrieval relevance precision/recall grounding quality and hallucination reduction.
API and application testing
- Test REST endpoints supporting GenAI features (request/response contracts error handling timeouts).
ETL and data quality validation
- Test ingestion and transformation logic; validate schema constraints and field-level rules.
- Implement data profiling reconciliation between sources and targets and lineage checks.
- Verify data privacy controls masking and retention policies across pipelines.
Non-functional testing
- Performance and load testing focused on latency throughput concurrency and rate limits for LLM calls.
- Cost-aware testing (token usage caching effectiveness) and timeout/retry behavior validation.
- Reliability and resilience checks including error recovery and fallback behavior.
Share results and insights; recommend remediation and preventive actions.

Required Qualifications

Experience
- 5 years in software QA including test strategy automation and defect management.
- 2 years testing AI/ML or GenAI features with hands-on evaluation design.
- 4 years testing ETL/data pipelines and data quality.
Technical skills
- Python: Strong proficiency building automated tests and tooling (PyTest requests pydantic or similar).
- API testing: REST contract testing schema validation negative testing.
- GenAI evaluation: crafting prompt suites golden datasets rubric-based scoring and automated evaluation pipelines.
- RAG testing: retrieval relevance grounding validation chunking/indexing verification and embedding checks.
- ETL/data quality: schema and constraint validation reconciliation lineage awareness data profiling.
Quality and governance
- Understanding of LLM limitations and methods to detect/reduce hallucinations.
- Safety and compliance testing including PII handling and prompt-injection resilience.
- Strong analytical and debugging skills across services and data flows.
Soft skills
- Excellent written and verbal communication; ability to translate quality goals into measurable criteria.
- Collaboration with AI engineers data engineers and product stakeholders.
- Organized detail-oriented and outcomes-focused.

Nice to Have

Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.
Experience creating synthetic datasets to stress specific behaviors.

Functional AI Tester - GenAIAbout the RoleYou will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG) conversational AI and Agentic evaluations. The role centers on:Systematic GenAI evaluation (qualitative and quantitative metrics)ETL and data quality testing for the...

Functional AI Tester - GenAI

About the Role

You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG) conversational AI and Agentic evaluations. The role centers on:

Systematic GenAI evaluation (qualitative and quantitative metrics)
ETL and data quality testing for the data flows that feed AI systems
Python-driven automated testing

This position is hands-on and collaborative partnering with AI engineers data engineers and product teams to define measurable acceptance criteria and ship high-quality AI features.

Key Responsibilities

Test strategy and planning
- Define risk-based test strategies and detailed test plans for GenAI features.
- Establish clear acceptance criteria with stakeholders for functional safety and data quality aspects.
Python test automation
- Build and maintain automated test suites using Python (e.g. PyTest requests).
- Implement reusable utilities for prompt/response validation dataset management and result scoring.
- Create regression baselines and golden test sets to detect quality drift.
GenAI evaluation
- Develop evaluation harnesses covering factuality coherence helpfulness safety bias and toxicity etc.
- Design prompt suites scenario-based tests and golden datasets for reproducible measurements.
- Implement guardrail tests including prompt-injection resilience unsafe content detection and PII redaction checks.
- Track quality metrics over time.
RAG and semantic retrieval testing
- Verify alignment between retrieved sources and generated answers.
- Verify adversarial tests.
- Measure retrieval relevance precision/recall grounding quality and hallucination reduction.
API and application testing
- Test REST endpoints supporting GenAI features (request/response contracts error handling timeouts).
ETL and data quality validation
- Test ingestion and transformation logic; validate schema constraints and field-level rules.
- Implement data profiling reconciliation between sources and targets and lineage checks.
- Verify data privacy controls masking and retention policies across pipelines.
Non-functional testing
- Performance and load testing focused on latency throughput concurrency and rate limits for LLM calls.
- Cost-aware testing (token usage caching effectiveness) and timeout/retry behavior validation.
- Reliability and resilience checks including error recovery and fallback behavior.
Share results and insights; recommend remediation and preventive actions.

Required Qualifications

Experience
- 5 years in software QA including test strategy automation and defect management.
- 2 years testing AI/ML or GenAI features with hands-on evaluation design.
- 4 years testing ETL/data pipelines and data quality.
Technical skills
- Python: Strong proficiency building automated tests and tooling (PyTest requests pydantic or similar).
- API testing: REST contract testing schema validation negative testing.
- GenAI evaluation: crafting prompt suites golden datasets rubric-based scoring and automated evaluation pipelines.
- RAG testing: retrieval relevance grounding validation chunking/indexing verification and embedding checks.
- ETL/data quality: schema and constraint validation reconciliation lineage awareness data profiling.
Quality and governance
- Understanding of LLM limitations and methods to detect/reduce hallucinations.
- Safety and compliance testing including PII handling and prompt-injection resilience.
- Strong analytical and debugging skills across services and data flows.
Soft skills
- Excellent written and verbal communication; ability to translate quality goals into measurable criteria.
- Collaboration with AI engineers data engineers and product stakeholders.
- Organized detail-oriented and outcomes-focused.

Nice to Have

Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.
Experience creating synthetic datasets to stress specific behaviors.

Key Skills

Change Management
Civil Engineering
Infection Control
Information Technology Sales
Biology

Apply Now

About Company

Michelin

Les pneus MICHELIN et services adaptés à votre mobilité. Trouvez le bon pneu pour votre véhicule, les conseils de nos experts, et les revendeurs en France

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click