AI Evaluation & Safety Test Engineer – Conversational AI, Automation, and Responsible AI Standards

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking a skilled AI Agent Test Engineer to lead validation efforts for conversational and agentic AI this role you will develop evaluation frameworks test harnesses and safety controls to ensure AI agents deliver accurate secure and compliant experiences for users. Your work will support the organizations commitment to responsible AI high-quality outputs and robust system performance primarily within financial and banking domains. Your contributions will enhance client trust and operational excellence through rigorous testing and continuous monitoring of AI agent behaviors.

Software Requirements

Required:

Experience with QA automation frameworks and tools such as Selenium TestNG Maven Jenkins and JIRA
Strong programming skills in Java and Python with experience in automating API and UI tests
Knowledge of AI evaluation pipelines including prompt validation safety checks and agent output assessment
Familiarity with chatbot and conversational AI frameworks and agent architectures
Experience designing and executing end-to-end test scenarios and safety protocols for AI systems
Experience with CI/CD integration telemetry and observability tools

Preferred:

Exposure to experiment tracking and version control systems for managing prompts datasets and configurations
Knowledge of vector databases embeddings and retrieval metrics for RAG systems
Familiarity with safety tooling responsible-AI frameworks and governance standards (e.g. fairness bias PII privacy)

Overall Responsibilities

Design develop and execute automated evaluation harnesses to validate agent responses safety and performance
Build test scenarios that evaluate multi-turn conversations task success helpfulness and policy adherence
Validate tool and function call schemas error handling retries and resilience to failures
Assess retrieval-augmented generation (RAG) quality including accuracy grounding citations and indexing
Conduct safety testing including prompt injection jailbreak content moderation and escalation logic
Monitor runtime KPIs such as accuracy resolution rate latency and token usage; develop dashboards and trend analyses
Track prompt configuration and safety rule changes and validate new agent versions via shadow testing and evaluation thresholds
Develop and maintain automated tests for APIs UI and databases where applicable
Participate in Agile ceremonies including sprint planning backlog refinement and retrospectives
Document testing strategies results and safety audit reports for compliance and governance purposes
Support continuous improvement initiatives to strengthen test coverage reliability and compliance

Technical Skills (By Category)

Programming Languages (Essential):

Java Python for automation and evaluation scripting

Preferred:

Other languages like JavaScript or notebooks for data analysis and report generation

Testing & Evaluation Tools:

Selenium TestNG Maven Jenkins for automation pipelines
API validation tools e.g. Postman RestAssured (preferred)
Evaluation frameworks for AI model assessment version control and experiment tracking tools

AI & Retrieval Systems:

Knowledge of retrieval-augmented generation (RAG) architecture embeddings and retrieval metrics
Experience testing content grounding citation correctness and index coverage

Data & Monitoring:

SQL for database querying and validation
Telemetry tools: Prometheus Grafana JFR JMC or similar for performance monitoring
Dashboard creation and trend analysis for runtime KPIs

Security & Compliance:

Familiarity with responsible-AI principles bias mitigation privacy standards and content moderation policies

Experience Requirements

Minimum 5 years experience in QA/test automation environments with specific focus on AI NLP or conversational agents
Proven success in designing implementing and maintaining evaluation harnesses for AI systems
Experience in testing safety fairness and compliance aspects of AI functionality
Hands-on in API and UI automation with strong scripting and programming capabilities
Knowledge of enterprise AI tools telemetry and observability in regulated settings

Day-to-Day Activities

Develop and enhance automated evaluation and safety testing frameworks for conversational agents
Create multi-turn test scenarios validate outputs and track performance metrics
Investigate and troubleshoot issues related to agent safety accuracy and grounding
Collaborate with data scientists product managers and security teams to ensure high standards
Monitor system KPIs and create dashboards for ongoing performance analysis
Conduct shadow testing for new agent versions and validate against evaluation thresholds
Keep updated on responsible-AI standards safety techniques and emerging evaluation metrics
Document testing procedures safety checklists and compliance reports for audits

Qualifications

Bachelors or Masters degree in Computer Science AI Data Science or related fields
5 years of experience in QA automation particularly with conversational AI systems
Proven expertise in evaluation methodologies safety testing and model validation
Experience with API UI and database automation tools in enterprise environments
Certifications or training in AI ethics safety or responsible-AI frameworks (preferred)

Professional Competencies

Strong analytical and critical thinking skills for complex AI validation tasks
Excellent communication skills for cross-team collaboration and documentation
Leadership ability to guide junior engineers and foster best testing practices
Adaptability to rapidly evolving AI safety standards and regulatory landscapes
Detail-oriented approach ensuring thorough testing coverage and compliance
Proactive learning attitude towards responsible-AI principles and emerging evaluation tools

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Job SummarySynechron is seeking a skilled AI Agent Test Engineer to lead validation efforts for conversational and agentic AI this role you will develop evaluation frameworks test harnesses and safety controls to ensure AI agents deliver accurate secure and compliant experiences for users. Your wor...

Job Summary

Software Requirements

Required:

Experience with QA automation frameworks and tools such as Selenium TestNG Maven Jenkins and JIRA
Strong programming skills in Java and Python with experience in automating API and UI tests
Knowledge of AI evaluation pipelines including prompt validation safety checks and agent output assessment
Familiarity with chatbot and conversational AI frameworks and agent architectures
Experience designing and executing end-to-end test scenarios and safety protocols for AI systems
Experience with CI/CD integration telemetry and observability tools

Preferred:

Exposure to experiment tracking and version control systems for managing prompts datasets and configurations
Knowledge of vector databases embeddings and retrieval metrics for RAG systems
Familiarity with safety tooling responsible-AI frameworks and governance standards (e.g. fairness bias PII privacy)

Overall Responsibilities

Design develop and execute automated evaluation harnesses to validate agent responses safety and performance
Build test scenarios that evaluate multi-turn conversations task success helpfulness and policy adherence
Validate tool and function call schemas error handling retries and resilience to failures
Assess retrieval-augmented generation (RAG) quality including accuracy grounding citations and indexing
Conduct safety testing including prompt injection jailbreak content moderation and escalation logic
Monitor runtime KPIs such as accuracy resolution rate latency and token usage; develop dashboards and trend analyses
Track prompt configuration and safety rule changes and validate new agent versions via shadow testing and evaluation thresholds
Develop and maintain automated tests for APIs UI and databases where applicable
Participate in Agile ceremonies including sprint planning backlog refinement and retrospectives
Document testing strategies results and safety audit reports for compliance and governance purposes
Support continuous improvement initiatives to strengthen test coverage reliability and compliance

Technical Skills (By Category)

Programming Languages (Essential):

Java Python for automation and evaluation scripting

Preferred:

Other languages like JavaScript or notebooks for data analysis and report generation

Testing & Evaluation Tools:

Selenium TestNG Maven Jenkins for automation pipelines
API validation tools e.g. Postman RestAssured (preferred)
Evaluation frameworks for AI model assessment version control and experiment tracking tools

AI & Retrieval Systems:

Knowledge of retrieval-augmented generation (RAG) architecture embeddings and retrieval metrics
Experience testing content grounding citation correctness and index coverage

Data & Monitoring:

SQL for database querying and validation
Telemetry tools: Prometheus Grafana JFR JMC or similar for performance monitoring
Dashboard creation and trend analysis for runtime KPIs

Security & Compliance:

Familiarity with responsible-AI principles bias mitigation privacy standards and content moderation policies

Experience Requirements

Minimum 5 years experience in QA/test automation environments with specific focus on AI NLP or conversational agents
Proven success in designing implementing and maintaining evaluation harnesses for AI systems
Experience in testing safety fairness and compliance aspects of AI functionality
Hands-on in API and UI automation with strong scripting and programming capabilities
Knowledge of enterprise AI tools telemetry and observability in regulated settings

Day-to-Day Activities

Develop and enhance automated evaluation and safety testing frameworks for conversational agents
Create multi-turn test scenarios validate outputs and track performance metrics
Investigate and troubleshoot issues related to agent safety accuracy and grounding
Collaborate with data scientists product managers and security teams to ensure high standards
Monitor system KPIs and create dashboards for ongoing performance analysis
Conduct shadow testing for new agent versions and validate against evaluation thresholds
Keep updated on responsible-AI standards safety techniques and emerging evaluation metrics
Document testing procedures safety checklists and compliance reports for audits

Qualifications

Bachelors or Masters degree in Computer Science AI Data Science or related fields
5 years of experience in QA automation particularly with conversational AI systems
Proven expertise in evaluation methodologies safety testing and model validation
Experience with API UI and database automation tools in enterprise environments
Certifications or training in AI ethics safety or responsible-AI frameworks (preferred)

Professional Competencies

Strong analytical and critical thinking skills for complex AI validation tasks
Excellent communication skills for cross-team collaboration and documentation
Leadership ability to guide junior engineers and foster best testing practices
Adaptability to rapidly evolving AI safety standards and regulatory landscapes
Detail-oriented approach ensuring thorough testing coverage and compliance
Proactive learning attitude towards responsible-AI principles and emerging evaluation tools

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Candidate Application Notice

Required Experience:

Key Skills

Google Analytics
Automation
ASP.NET
Automation Testing
Electrical & Automation

Apply Now

About Company

Synechron

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click