Data Scientist

Raleigh - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Title : Data scientist

Location: Raleigh NC (Onsite)

Job Type : Contract

As a data scientist on our team you will work on new product development in a small team environment writing production code in both run-time and build-time environments. You will help propose and build data-driven solutions for high-value customer problems by discovering extracting and modeling knowledge from large-scale natural language datasets including matter and contract repository invoice/legal spend data and work management. You will prototype new ideas collaborating with other data scientists as well as product designers data engineers front-end developers and a team of expert legal data annotators. You will get the experience of working in a start-up culture with the large datasets and many other resources of an established company.

RESPONSIBILITIES

Develop and implement LLM-based applications tailored for in-house legal
Fine-tune and deploy large language models to enhance their performance on legal text processing tasks
Evaluate and help maintain our data assets and training/evaluation data sets
Design and build pipelines for preprocessing annotating and managing legal document datasets
Collaborate with legal experts to understand requirements and ensure models meet domain-specific needs
Conduct experiments and evaluate model performance to drive continuous improvements
Interface with other technical personnel or team members to finalize requirements.
Work closely with other development team members to understand moderately complex product requirements and translate them into software designs.
Successfully implement development processes coding best practices and code reviews for production environments.

REQUIREMENTS

Formal training in machine learning: dimensionality reduction clustering embeddings and sequence classification algorithms
Experience with deep learning frameworks such as PyTorch Tensorflow and Hugging Face Transformers.
Practical experience in Natural Language Processing methods and libraries such as spaCy word2vec TensorFlow Keras PyTorch Flair BERT
Practical experience with large language models prompt engineering fine-tuning and benchmarking using frameworks such as LangChain and LlamaIndex
Strong Python background
Knowledge of AWS GCP Azure or other cloud platform
Understanding of data modeling principles and complex data models.
Proficiency with relational and NoSQL databases as well as vector stores (e.g. Postgres Elasticsearch/OpenSearch ChromaDB)
Knowledge of Scala Spark Ray or other distributed computing systems highly preferred
Knowledge of API development containerization and machine learning deployment highly preferred
Experience with ML Ops/AI Ops highly preferred

PREFERRED QUALIFICATIONS

MS in Data Science Computer Science Statistics Machine Learning or related field

Job Title : Data scientist Location: Raleigh NC (Onsite) Job Type : Contract As a data scientist on our team you will work on new product development in a small team environment writing production code in both run-time and build-time environments. You will help propose and build data-driven soluti...

Job Title : Data scientist

Location: Raleigh NC (Onsite)

Job Type : Contract

RESPONSIBILITIES

Develop and implement LLM-based applications tailored for in-house legal
Fine-tune and deploy large language models to enhance their performance on legal text processing tasks
Evaluate and help maintain our data assets and training/evaluation data sets
Design and build pipelines for preprocessing annotating and managing legal document datasets
Collaborate with legal experts to understand requirements and ensure models meet domain-specific needs
Conduct experiments and evaluate model performance to drive continuous improvements
Interface with other technical personnel or team members to finalize requirements.
Work closely with other development team members to understand moderately complex product requirements and translate them into software designs.
Successfully implement development processes coding best practices and code reviews for production environments.

REQUIREMENTS

Formal training in machine learning: dimensionality reduction clustering embeddings and sequence classification algorithms
Experience with deep learning frameworks such as PyTorch Tensorflow and Hugging Face Transformers.
Practical experience in Natural Language Processing methods and libraries such as spaCy word2vec TensorFlow Keras PyTorch Flair BERT
Practical experience with large language models prompt engineering fine-tuning and benchmarking using frameworks such as LangChain and LlamaIndex
Strong Python background
Knowledge of AWS GCP Azure or other cloud platform
Understanding of data modeling principles and complex data models.
Proficiency with relational and NoSQL databases as well as vector stores (e.g. Postgres Elasticsearch/OpenSearch ChromaDB)
Knowledge of Scala Spark Ray or other distributed computing systems highly preferred
Knowledge of API development containerization and machine learning deployment highly preferred
Experience with ML Ops/AI Ops highly preferred