drjobs Senior Data Scientist LLM

Senior Data Scientist LLM

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Gipuzkoa - Spain

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Multiverse Computing

Multiverse is a wellfunded and fastgrowing deeptech company founded in 2019. We are the biggest Quantum Software company in the EU. We are also one of the 100 most promising companies in AI in the world (according to CB Insights 2023 with 150 employees and growing fully multicultural and international.

We provide hyperefficient software to companies seeking to gain an edge with quantum computing and artificial intelligence. Our main products Singularity and CompactifAI address critical needs across various industries. Singularity remains a trusted solution for bluechip companies in finance energy manufacturing cybersecurity and more. CompactifAI on the other hand is a groundbreaking compressing tool of foundational models that uses Tensor Networks to extremely compress AI systems such as large language models making these efficient and portable.

You will be working alongside world leading experts to build solutions that tackle real life issues. We look for passionate people that want to grow in an ethics driven environment promoting sustainability and diversity. We aim to continue building our truly inclusive culture come and join us.

We are seeking a Senior Data Scientist with deep expertise in creating highquality datasets for training and finetuning Large Language Models (LLMs). You will be responsible for designing and implementing scalable data pipelines and strategies to support all stages of LLM development: pretraining supervised finetuning and reinforcement learning with human feedback (RLHF).

This role is critical to ensuring the robustness safety and alignment of our AI models. You will have the autonomy to explore innovative data sourcing and curation methods and the opportunity to directly influence the capabilities of stateoftheart LLMs.

As a Senior Data Scientist you will

  • Design and implement strategies for creating sourcing and augmenting datasets tailored for LLM training and finetuning.
  • Develop scalable pipelines to collect clean filter annotate and validate large volumes of text data.
  • Conduct data audits to ensure quality diversity ethical compliance and bias mitigation.
  • Collaborate with ML engineers and researchers to align datasets with training objectives and model evaluation needs.
  • Use tools like Active Learning synthetic data generation and selfsupervised learning to maximize dataset efficiency.
  • Leverage humanintheloop (HITL) workflows for data labeling and validation where necessary.
  • Contribute to building data documentation and metadata standards (e.g. Datasheets for Datasets).
  • Keep up to date with research trends in dataset curation LLM pretraining data and benchmarking.

Required Qualifications

  • Bachelors Masters or Ph.D. in Computer Science AI Data Science or a related field.
  • 3 years of experience in data science machine learning or related roles with demonstrated experience in dataset creation for NLP or LLMs.
  • Indepth knowledge of the LLM lifecycle: pretraining finetuning alignment and evaluation.
  • Proficient in Python and data tooling ecosystems (Pandas NumPy spaCy Hugging Face Datasets & Transformers).
  • Handson experience with text data collection from diverse sources: web sing APIs proprietary corpora etc.
  • Strong understanding of data quality metrics including bias detection toxicity and readability.
  • Experience working with annotation tools (e.g. Prodigy Label Studio) and managing annotation teams or workflows.

Preferred Qualifications

  • Experience building or contributing to datasets used in LLM pretraining or supervised finetuning.
  • Familiarity with RLHF workflows and alignment techniques (e.g. preference modeling reward modeling).
  • Exposure to multilingual and lowresource language datasets.
  • Contributions to opensource datasets tools or publications in datasetcentric research.
  • Knowledge of ethical AI data governance privacy laws (e.g. GDPR) and responsible data use.

Perks & Benefits

  • Indefinite contract.
  • Equal pay guaranteed.
  • Variable performance bonus.
  • Signing bonus.
  • We offer work visa sponsorship (If applicable).
  • Relocation package (if applicable).
  • Private health insurance.
  • Eligibility for educational budget according to internal policy.
  • Hybrid opportunity.
  • Flexible working hours.
  • Language classes and discounted lunch options
  • Working in a high paced environment working on cutting edge technologies.
  • Career plan. Opportunity to learn and teach.
  • Progressive Company. Happy people culture

As an equal opportunity employer Multiverse Computing is committed to building an inclusive workplace. The company welcomes people from all different backgrounds including age citizenship ethnic and racial origins gender identities individuals with disabilities marital status religions and ideologies and sexual orientations to apply.

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.