Who we are:
Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2000 customers and operations in 13 cities around the world we are the AI technology solutions provider-of-choice to 4 out of 5 of the worlds biggest technology companies as well as leading companies across financial services insurance technology law and medicine.
By combining advanced machine learning and artificial intelligence (ML/AI) technologies a global workforce of subject matter experts and a high-security infrastructure were helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use high-quality platforms.
Our global workforce includes over 3000 employees in the United States Canada United Kingdom the Philippines India Sri Lanka Israel and Germany. Were poised for a period of explosive growth over the next few years.
Position Summary:
Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your experience with human and synthetic data workflows to drive innovation and continuous improvement. The ideal candidatemusthave the right mix of skills in(computational)linguistics and human evaluation tasks data science and data engineering.
Who Were Looking For:
You have at least 3 years of relevant experience with data creation curation and analysis for GenAI applications (e.g. RAG Agents complex reasoning). You are an expert in designing collection evaluation and quality assurance processes using human-in-the-loop and synthetic techniques. You bring a wealth of expertise in language culture and multi-lingual projects. You are experienced in analyzing data with advanced statistical tools and driving success through process excellence.
Your understanding of machine learning Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) help you tackle challenges with a critical innovative mindset. Youre also a strong communicator excelling in cross-functional collaboration and understanding business needs.
Tell Me More:
As a Language Data Scientist you create and own processes for creating validating and annotating data for use in LLM/ML applications. This can be natural language data or multimodal data including images video audio and others. You consult and engage with customers to understand their business goals and design processes to meet them. You generate insights about the clients processes and products to drive improvement and innovation. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services.
Responsibilities:
Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data collection workflows as well as synthetic ones.
Dive deep into existing workflows and processes to gather data and insights make recommendations and drive improvement through innovation and cross-functional collaboration with customers
Critically assess annotation tooling and workflows
Quantitatively analyze large datasets perform statistical analysis calculate metrics and make recommendations to improve accuracy and performance
Work closely with client stakeholders on understanding goals gathering requirements proposing solutions and executing them.
Knowledge of how components of GenAI products or services combine to work
Collaborating with cross-functional teams to define AI project requirements and objectives ensuring alignment with overall business goals
MA in (computational) linguistics data science computer science (AI / ML / NLU) quantitative social sciences or a related scientific / quantitative field PhD strongly preferred
Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks including multi-phase and complex workflows.
Deep understanding of language and its relationship with culture
Ability to identify ambiguity and subjectivity in language
Ability to work with multi-lingual and multi-modal projects
Quantitative Analysis Skills:Advanced knowledge of statistics metrics(e.g.f1 score inter-rater reliability metrics) and data analysis methodssuch as sampling.
Technical skills:
Experience with Natural Language Processing (NLP) techniques and tools such as SpaCy NLTK or Hugging Face.
Proficiency in Python to:
handle / transform large datasets (e.g. pre- and postprocessing data pandas)
perform quantitative analyses
visualize data (for example matplotlib seaborn)
Data processing:
Deep understanding of data pipelines to support ML and NLP workflows
Knowledge of efficient data collection transformation and storage
Knowledge of data structures algorithms and data engineering principles
Excellent interpersonal skills for effectivecross-functionalstakeholder engagement
Excellent problem-solving skills with the ability to think critically and creatively to develop innovative AI solutions
Ability to work independently and collaborate as part of a team
Adaptable to changing technologies and methodologies
Ability to translate experience research and developmentinformation to understand client products and services.
Preferred Skills
Conducting research to stay up-to-date with the latest advancements in generative AI machine learning and deep learning techniques
Knowledge of optimizing existing generative AI models for improved performance scalability and efficiency
Experience of developing and maintaining ML/AI pipelines including data preprocessing feature extraction model training and evaluation
Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets improving their performance and relevance
Developing clear and concise documentation including technical specifications user guides and presentations to communicate complex AI concepts to both technical and nontechnical stakeholders
Contributing to establishing best practices and standards for generative AI development with customers and within the organization
Providing technical mentorship and guidance to junior team members
Understanding of techniques such as GPT VAE and GANs
Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment banking details or sensitive personal information during the application process. To learn more on how to recognize job scams please visit the Federal Trade Commissions guide at you believe youve been targeted by a recruitment scam please report it to Innodata at and consider reporting it to the FTC at .
#LI-NS1
Your application has been successfully submitted!
Required Experience:
IC