Alexa: An AI assistant that gets things done shes smarter more conversational and more capable. With 600 million Alexa devices now out in the world the latest advancements in generative AI have unlocked new possibilities enabling us to reimagine the experience in our pursuit of making customers lives better and easier every day. Experts enables Alexa to complete complex tasks using Amazon and third-party (3P) APIs and websites. Some of these include: playing your favorite music control your smart devices reserving a table at a restaurant setting your morning alarms kitchen timers booking event tickets scheduling an appointment and planning a trip. No more jumping between apps just say what you want and its done. Developers and partners have been core to our vision since the beginning. With Alexa we reimagined how developers can build for Alexa. Alexa AI developer Technology charter is to enable creation of experts at scale and make it seamless for 1P and 3P developers to integrate their businesses with Alexa. We are measurably making Alexa smarter and we need your help to define and build the next generation of capabilities using GenerativeAI.
Key job responsibilities
As an SDET driving the creation of next-generation LLM-based evaluation systems for Alexa youll design and build the frameworks that define how conversational intelligence is measured determining whether millions of daily interactions feel accurate natural and human-centered. Your work goes far beyond binary pass/fail tests: youll engineer automated systems that assess accuracy reasoning depth tone and responsiveness across multimodal context-rich conversations.
In this role traditional testing boundaries dissolve. Youll evaluate not just functional correctness but whether the AIs responses are contextually relevant emotionally aligned and conversationally fluent. Your systems will measure everything from factual accuracy and task completion to subtler attributes like dialog flow personality coherence and graceful curtailment. Partnering closely with scientists and engineers youll automate the detection of conversational regressions identifying hallucinations degraded reasoning or misaligned tones before they reach customers. Youll leverage prompt-driven evaluation pipelines LLM-as-a-Judge (LLMaaJ) frameworks and reference-based validation to ensure assessments remain consistent explainable and scalable across model versions and releases.
Youll also collaborate with prompt engineers model developers and product teams to establish robust category-specific testing methodologiesfrom quick one-shot actions and task fulfillment to multi-turn dialogues and creative open-ended interactions. Through your work evaluation evolves from a quality gate into a self-improving assessment frameworks one that learns adapts and ensures every voice interaction feels naturally conversational.
- Bachelors degree or above in computer science computer engineering or related field or Associates degree or above
- 4 years of experience as an SDET Software Engineer or QA Automation Engineer
- Proficiency in at least one programming language (e.g. Java Python or JavaScript)
- Hands-on experience building and maintaining automation frameworks for UI API and backend systems
- Strong understanding of API testing data-driven validation and test metrics
- Excellent problem-solving skills and ability to dive deep into complex systems to identify root causes
- Strong communication and collaboration skills with a passion for driving quality across teams
- Knowledge of overall system architecture scalability reliability and performance in a database environment
- Experience with security in service-oriented architectures and web services
- Masters degree in Computer Science or a related field
- Experience with AWS or cloud technologies
- Experience with LLM-based evaluation prompt-driven testing or LLM-as-a-Judge frameworks
- Hands-on experience using AI and LLM-based frameworks to automate quality testing and build intelligent evaluation systems.
- Ability to design semantic and behavioral validation beyond functional correctness
- Proven record of collaborating with scientists and engineers to define measurable AI quality metrics
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit
for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.