LLM QA

Mumbai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Key ResponsibilitiesTest Planning & ExecutionDesign and execute comprehensive test plans for applications integrating LLMs and AWS data pipelines model outputs APIs and user interfaces with a focus on LLM-driven features such as text generation summarization and Quality & EvaluationDevelop test cases for LLM responses ensuring accuracy relevance consistency and prompt quality model outputs and fine-tuned model behavior using both manual and automated RAG pipelines and validate integration with vector databases or knowledge & ToolingBuild and maintain automated test suites for backend services and LLM workflows using tools like PyTest Postman Selenium or tests into CI/CD pipelines (e.g. using GitHub Actions Jenkins or CodePipeline).AWS Integration TestingTest cloud-native services and workflows built with AWS tools such as:Lambda S3 API Gateway Step Functions Bedrock SageMaker CloudWatchValidate infrastructure-as-code using CloudFormation or CDK in test Security & Compliance TestingConduct performance testing of LLM APIs (e.g. latency throughput model fallback behavior).Ensure compliance with data privacy audit and security standards when using LLMs in production & DocumentationWork closely with engineers data scientists and product managers to define quality metrics for LLM use test strategies bug reports and QA process Skills5 years of experience in software or data quality -on experience testing applications built on AWS (Lambda S3 API Gateway CloudWatch Step Functions etc.).Experience testing or validating LLM-powered features (e.g. OpenAI Amazon Bedrock HuggingFace Claude).Strong skills in test automation (e.g. PyTest Selenium REST API testing).Good understanding of CI/CD and DevOps in AWS with Python or JavaScript/TypeScript for test to HaveFamiliarity with LLMOps prompt testing frameworks or prompt injection with LangChain RAG patterns or vector databases (e.g. Pinecone FAISS).Exposure to model evaluation metrics (e.g. BLEU ROUGE factual consistency).Basic knowledge of data pipelines ETL or analytics ExperienceTesting GenAI or AI-enhanced applications in real-world ethical AI usage data privacy and compliance in AI with monitoring LLM behaviour in production and logging tools.