AI Quality Engineer


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 20 days ago
Vacancies: 1 Vacancy

Job Summary

About the Role

Testing AI systems is a fundamentally different problem than testing traditional software. Outputs are non-deterministic. Correct is often a spectrum. And the failure modeshallucinations drift prompt injectiondont show up in unit tests. We need an engineer who understands this and can build the testing strategies evaluation frameworks and quality infrastructure to keep our agents reliable in production.

As an AI Quality Engineer youll design how we test intelligent agents agentic workflows and Foundation Layer capabilities. This is not a manual QA roleyoull write code build evaluation pipelines and create automated testing frameworks that run in CI/CD. Youll define what quality means for AI systems at AGS and build the systems to measure it.

Youll work across every solution the team builds which means youll have broad visibility into the architecture and deep understanding of how our agents behave in the real world. If youre an engineer who cares about quality and wants to solve testing problems that most teams havent figured out yet this is the role.

Responsibilities

 

Testing Strategy & Design

  • Define testing strategies for AI agents conversational interfaces and agentic workflows
  • Design behavioral test suites for non-deterministic outputswhere correct isnt binary
  • Build evaluation frameworks that measure groundedness factuality relevance and task completion
  • Identify failure modes specific to AI systems: hallucinations prompt injection context window limitations drift
  • Develop testing approaches for each architecture pattern: RAG function calling human-in-the-loop autonomous workflows

 

Test Automation & Infrastructure

  • Build automated evaluation pipelines that run as part of CI/CD
  • Create test harnesses for LLM-based systemsmocking fixtures and reproducible test scenarios
  • Develop regression suites that detect quality degradation when prompts models or data change
  • Build monitoring and alerting for production agent quality (accuracy latency error rates)
  • Maintain test infrastructure: test data management environment setup reporting dashboards

 

Evaluation & Metrics

  • Define quality metrics for each solutionwhat to measure and what thresholds matter
  • Build and maintain evaluation datasets (ground truth reference outputs edge case collections)
  • Conduct systematic prompt evaluation when prompts or models change
  • Track quality trends over time and identify when re-evaluation is needed
  • Report quality metrics to the team and stakeholders in clear actionable terms

 

Collaboration & Quality Culture

  • Partner with AI Solutions Engineers to define testability requirements during design
  • Work with AI Solutions Analysts to translate acceptance criteria into test scenarios
  • Review solution designs from a quality and testability perspective
  • Advocate for quality practices across the teamtesting isnt an afterthought its part of delivery
  • Contribute to incident response by diagnosing quality failures and building regression tests

Qualifications :

Qualifications

 

Required

  • 37 years of software engineering or quality engineering experience
  • Strong programming skills in Python and/or TypeScriptyou write test code not just test cases
  • Experience designing and building automated test frameworks
  • Understanding of AI/ML systemsyou know why testing LLM outputs is different from testing deterministic code
  • Experience with CI/CD pipelines and integrating automated tests into build processes
  • Ability to reason about non-deterministic systems and design meaningful quality metrics
  • Strong analytical skillsyou can look at agent outputs and determine whether theyre good enough

 

Preferred

  • Experience testing AI/ML applications conversational interfaces or chatbots
  • Background in LLM evaluation: prompt testing groundedness scoring factuality checking
  • Familiarity with evaluation frameworks (DeepEval Ragas custom evaluation pipelines)
  • Experience with Microsoft Power Platform (Power Automate Copilot Studio) testing
  • Background in Azure services and cloud-based test infrastructure
  • Experience with load testing and performance testing for API-based systems
  • Familiarity with staffing HR tech or workforce management domains

 

Technology Stack

  • Languages: Python TypeScript
  • Platforms: Azure (Container Apps Functions AI Services) Microsoft 365
  • Testing: pytest evaluation frameworks (DeepEval Ragas custom) load testing tools
  • AI/ML: LLM evaluation prompt testing RAG evaluation behavioral testing
  • Data: REST APIs Dataverse SQL
  • Tools: Git GitHub CI/CD pipelines Docker monitoring/alerting (Application Insights)

We dont expect expertise in everything. AI quality engineering is a new disciplinewe expect strong engineering fundamentals and the ability to figure out new problems.

 

What Were NOT Looking For

  • Manual testers who write test cases in spreadsheets
  • QA professionals who treat testing as a gate at the end of development rather than a practice woven into it
  • People who expect deterministic pass/fail for every testAI quality requires nuance
  • Engineers who test to the spec but dont think about how real users will break things

 

What Makes You Stand Out

  • Youve tested a system where correct was hard to defineand found a way to measure it anyway
  • You write test code thats as clean and maintainable as production code
  • You think about edge cases that nobody else considers
  • You can explain why a particular quality metric matters and what threshold makes sense
  • Youve built test automation that actually caught regressions before they hit production
  • Youre comfortable saying this isnt good enough and backing it up with data

 

What Were Building

The AI Engineering team delivers intelligent solutions for AGSs global clients:

  • Intelligent Agents Conversational AI that helps hiring managers recruiters and internal teams get work done faster
  • Agentic Workflows Automated processes where AI executes tasks with human oversight
  • Foundation Capabilities Reusable AI services that power multiple solutions

Youll make sure these systems work reliablynot just at launch but as models change data evolves and usage scales.

 

Career Growth

AI quality engineering is an emerging discipline with no ceiling. Growth paths include:

  • Depth Become the teams authority on AI evaluation and testing methodology influencing quality standards across the organization
  • Breadth Move into a Senior or Lead AI Solutions Engineer role bringing your quality mindset to architecture and delivery
  • Specialization Build expertise in areas like LLM security testing AI safety or evaluation research

Additional Information :

As a workplace we focus on relationships with each other our clients and our candidates - in fact serving others is one of our core values. We support open communication and recognize that giving constructive criticism can be even harder than receiving it. We appreciate the fearless and the passionate who force us to be better. Everything we do sits on a pillar of diversity - diverse perspectives backgrounds and ideas drive innovation and make us successful.

See what its like to work at AGS by searching #LifeAtAGS on any social network.


Remote Work :

No


Employment Type :

Full-time

About the RoleTesting AI systems is a fundamentally different problem than testing traditional software. Outputs are non-deterministic. Correct is often a spectrum. And the failure modeshallucinations drift prompt injectiondont show up in unit tests. We need an engineer who understands this and can ...

About Company

Company Logo

Working at Allegis Global Solutions (AGS) is more than just a job. It’s a career. It’s a community of people who invest in your development and empower you to blaze your own trail. Each of us is here to create real, measurable impact that moves needles. We operate beyond "roles" or "j ... View more

View Profile View Profile