Python Developer (AI Evaluation Frameworks)
Job Summary
Job summary:
We are seeking a seasoned Python Engineer with 57 years of professional experience and exposure to QA practices to join our team focused on development of AI evaluation frameworks. The ideal candidate combines handson Python engineering skills a QA mindset and practical familiarity with GenAI/LLM concepts and Azure cloud services. You will design build and maintain scalable evaluation systems and work closely with QA teams and stakeholders to ensure robust repeatable assessment of AI components.
Key responsibilities
Design and implement AI evaluation frameworks and tooling for model assessment benchmarking and automated testing of LLMs agents and GenAI features.
Build productiongrade Python applications APIs to support evaluation pipelines and integrations.
Collaborate with QA team brainstorm current evaluation challenges and build reproducible evaluation workflows.
Implement endtoend evaluation pipelines including data preprocessing metric computation test orchestration and reporting.
Ensure code quality and maintain coding standards through static analysis unit/integration tests code reviews and tooling (e.g. SonarQube).
Contribute to design and implementation of APIs and services.
Deploy and operate evaluation components on Azure leveraging platform services and following infrastructureascode practices.
Instrument monitoring logging and alerting for evaluation pipelines; capture audit trails and results for compliance and reproducibility.
Partner with data scientists ML engineers and product stakeholders to gather requirements validate evaluation approaches and incorporate feedback.
Support peers in troubleshooting and resolving issues across development and QA; mentor junior developers and share best practices.
Maintain documentation for evaluation frameworks runbooks etc.
Unit tests and unit plans are built executed optimized monitored ensuring quality security and consistency. Malfunctions incidents and bugs are detected understood analyzed reported and solved.
Required qualifications
57 years of professional Python development experience with strong demonstrable handson skills.
Solid understanding of OOPs concepts software design principles and coding best practices.
Experience with testdriven development writing unit and integration tests and collaborating with QA teams on automated testing.
Familiarity with the full project lifecycle: requirements design development code review deployment maintenance and deprecation.
Experience building RESTful APIs using FastAPI Flask or Django.
Practical experience with Azure cloud services and deployment patterns (App Services AKS Azure Functions Blob/Storage DevOps pipelines).
Exposure to CI/CD tooling and code quality tools such as SonarQube
Working knowledge of AI/DS conceptsparticularly GenAI LLMs RAG patterns and agent architectures.
Strong problem solving debugging skills and ability to work across distributed systems.
Excellent communication skills and demonstrated ability to work closely with QA data science and product teams.
Desirable (goodtohave)
Experience with LLM frameworks such as LangChain LlamaIndex or similar.
Familiarity with observability tools and ML/LLM monitoring.
Prior experience designing evaluation metrics for NLP/LLM tasks (e.g. BLEU/ROUGE embeddingsbased similarity human evaluation orchestration).
Prior knowledge and experience of working on traditional AI-ML systems.
Behavioral competencies
Mindset: attention to detail attention towards testability and reproducibility and strong focus on accuracy quality and safety.
Collaborative: able to partner effectively with QA ML and product stakeholders.
Proactive communicator: gathers feedback surfaces risks early and drives adoption of evaluation tooling.
Mentorship orientation: supports and uplifts team members through knowledge sharing.
Required Experience:
IC
About Company
Les pneus MICHELIN et services adaptés à votre mobilité. Trouvez le bon pneu pour votre véhicule, les conseils de nos experts, et les revendeurs en France