What you will do
Design and execute specialized evaluation and monitoring strategies to assess GenAI workflows focusing on multi-step reasoning tool-use reliability and looping risks where agents may fail in autonomous tasks.
Critically evaluate and monitor GenAI-specific risks including hallucinations prompt injection vulnerability and data leakage ensuring that mitigation strategies (such as guardrails and RAG-based grounding) are robust and effective.
Conduct research on emerging evaluators (e.g. LLM-as-a-judge) and develop benchmarking standards to systematically assess GenAI application outputs ensuring the system performs reliably on unstructured data where traditional statistical profiles do not apply.
Develop and execute comprehensive stress-testing protocols to assess GenAI soundness and identify potential risks.
Critically assess the completeness and accuracy of GenAI development documentation code and marketing materials.
Develop and implement innovative validation approaches for complex and nontraditional models including those with unstructured data and unique risk profiles.
Develop AI Agent tools to automate the retrieval wrangling and analysis of data.
Utilize combined knowledge of data structures analytics algorithms/models and strong computer science fundamentals to prepare datasets conduct analytics and develop deployable solutions with guidance from more senior resources.
Develop and deploy AI and ML solutions on Google Cloud Platforms.
Utilize massive data sources to craft business insights and features for innovative solutions.
Understand diverse data sources both structured and unstructured.
Required Experience:
Senior IC
VetJobs & Military Spouse Jobs works with our employer partners to source, screen, and move qualified talent to the desktops of the Hiring Managers. Application is a two-step process, so please be patient with the team. When you submit to a position on our site your information will ... View more