Job Description: AI Task Evaluation & Statistical Analysis Specialist
Role Overview
Were seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. Youll identify patterns root causes and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types file types criteria etc.).
Key Responsibilities
-
Statistical Failure Analysis: Identify patterns in AI agent failures across task components (prompts rubrics templates file types tags)
-
Root Cause Analysis: Determine whether failures stem from task design rubric clarity file complexity or agent limitations
-
Dimension Analysis: Analyze performance variations across finance sub-domains file types and task categories
-
Reporting & Visualization: Create dashboards and reports highlighting failure clusters edge cases and improvement opportunities
-
Quality Framework: Recommend improvements to task design rubric structure and evaluation criteria based on statistical findings
-
Stakeholder Communication: Present insights to data labeling experts and technical teams
Required Qualifications
-
Statistical Expertise: Strong foundation in statistical analysis hypothesis testing and pattern recognition
-
Programming: Proficiency in Python (pandas scipy matplotlib/seaborn) or R for data analysis
-
Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets
-
AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics
-
Tools: Comfortable working with Excel data visualization tools (Tableau/Looker) and SQL
Preferred Qualifications
-
Experience with AI/ML model evaluation or quality assurance
-
Background in finance or willingness to learn finance domain concepts
-
Experience with multi-dimensional failure analysis
-
Familiarity with benchmark datasets and evaluation frameworks
-
2-4 years of relevant experience
Job Description: AI Task Evaluation & Statistical Analysis Specialist Role Overview Were seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. Youll identify patterns root causes and systemic issues in our evaluation framework by...
Job Description: AI Task Evaluation & Statistical Analysis Specialist
Role Overview
Were seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. Youll identify patterns root causes and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types file types criteria etc.).
Key Responsibilities
-
Statistical Failure Analysis: Identify patterns in AI agent failures across task components (prompts rubrics templates file types tags)
-
Root Cause Analysis: Determine whether failures stem from task design rubric clarity file complexity or agent limitations
-
Dimension Analysis: Analyze performance variations across finance sub-domains file types and task categories
-
Reporting & Visualization: Create dashboards and reports highlighting failure clusters edge cases and improvement opportunities
-
Quality Framework: Recommend improvements to task design rubric structure and evaluation criteria based on statistical findings
-
Stakeholder Communication: Present insights to data labeling experts and technical teams
Required Qualifications
-
Statistical Expertise: Strong foundation in statistical analysis hypothesis testing and pattern recognition
-
Programming: Proficiency in Python (pandas scipy matplotlib/seaborn) or R for data analysis
-
Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets
-
AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics
-
Tools: Comfortable working with Excel data visualization tools (Tableau/Looker) and SQL
Preferred Qualifications
-
Experience with AI/ML model evaluation or quality assurance
-
Background in finance or willingness to learn finance domain concepts
-
Experience with multi-dimensional failure analysis
-
Familiarity with benchmark datasets and evaluation frameworks
-
2-4 years of relevant experience
View more
View less