AI Quality Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

New York City, NY - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Required Location: Hybrid/Midtown New York City or 100% Remote
A senior AI Quality Engineer role is critical in ensuring the accuracy reliability and relevance of outputs generated by large language model (LLM)-powered agents. You will be responsible for testing validating and fine-tuning AI agent responses focusing on identifying hallucinations misinterpretations and inconsistencies within Capital Markets use cases. Candidate will collaborate with AI/ML engineers to adjust prompts context windows guardrails and training data inputs to continuously improve response quality and conduct ground truth comparisons using curated datasets from Snowflake Databricks and other enterprise data sources.

Job Description:

We are seeking a highly skilled Capital Markets AI Quality Engineer to join our AI and GenAI innovation team. This role is critical in ensuring the accuracy reliability and relevance of outputs generated by large language model (LLM)-powered agents. You will be responsible for testing validating and fine-tuning AI agent responses focusing on identifying hallucinations misinterpretations and inconsistencies within Capital Markets use cases.

You will collaborate closely with AI engineers prompt engineers data scientists and domain experts to iteratively improve system performance. Experience with OpenAI and Anthropic ecosystems LLM fine-tuning and a strong understanding of Capital Markets (e.g. trading risk compliance research structured products) is essential. Data inputs will be centrally curated in Snowflake and Databricks which you will use to validate agent outputs against trusted sources.

Key Responsibilities:

  • Test and validate GenAI-powered agents across Capital Markets use cases to ensure output accuracy consistency and business alignment.
  • Identify and document hallucinations factual inaccuracies and other AI output anomalies in both structured and unstructured response formats.
  • Collaborate with AI/ML engineers to adjust prompts context windows guardrails and training data inputs to continuously improve response quality.
  • Conduct ground truth comparisons using curated datasets from Snowflake Databricks and other enterprise data sources.
  • Build and maintain automated and manual test suites for LLM-driven workflows and conversational agents.
  • Contribute to the design of evaluation metrics tailored to financial NLP use cases (e.g. accuracy relevancy regulatory compliance).
  • Perform regression testing as models are updated or fine-tuned ensuring changes do not introduce new errors or degradation.
  • Collaborate with Capital Markets SMEs to define expected behaviors and validate financial terminology logic and compliance relevance.
  • Track and report model performance issues to stakeholders and help prioritize areas for improvement.
  • Stay current with evolving GenAI technologies foundation models (OpenAI Anthropic) and best practices in AI quality and alignment.

Required Qualifications:

  • 3 years of experience working with Generative AI / LLMs including prompt tuning fine-tuning and output validation.
  • 3 years of experience in Capital Markets including trading systems investment banking structured products or risk and compliance.
  • Strong knowledge of OpenAI Anthropic or other foundation model ecosystems.
  • Familiarity with LLM evaluation frameworks and alignment techniques.
  • Hands-on experience working with Snowflake Databricks or similar data lake/data warehouse environments.
  • Solid understanding of NLP concepts model limitations (e.g. hallucinations context drift) and mitigation strategies.
  • Ability to write and execute both manual and automated test cases for complex AI systems.
  • Proficiency in Python or similar scripting language for data validation and model interaction.
  • Excellent communication and collaboration skills with both technical and business stakeholders.

Preferred Qualifications:

  • Experience using tools like LangChain LLMGuard TruLens or .
  • Prior experience working in AI product development especially in financial services or enterprise environments.
  • Knowledge of regulatory compliance requirements in finance (e.g. FINRA MiFID II SEC) as they relate to AI transparency and accuracy.
  • Familiarity with model monitoring and drift detection tools in production.

Required Location: Hybrid/Midtown New York City or 100% Remote A senior AI Quality Engineer role is critical in ensuring the accuracy reliability and relevance of outputs generated by large language model (LLM)-powered agents. You will be responsible for testing validating and fine-tuning AI ag...
View more view more

Key Skills

  • APQP
  • Quality Assurance
  • Six Sigma
  • ISO 9001
  • PPAP
  • Minitab
  • Root cause Analysis
  • ISO 13485
  • Quality Systems
  • Quality Management
  • As9100
  • Manufacturing