(ID: 2026-1886)
Position Overview
NGRF (NovaGen Research Fund) is a new data science collaboration hub including multiple labs and centers. NGRF is seeking a researcher to join the LLM Lab and contribute to NGRFs scientific discovery platform. This role combines original research in scientific integrity with hands-on development of NLP and AI capabilities on a large-scale biomedical data platform.
The researcher will work with a multi-terabyte curated literature database develop and evaluate NLP models build retrieval-augmented generation (RAG) pipelines and contribute to a federated knowledge graph that models scientific claims and evidence relationships. The role spans research and engineering publishing peer-reviewed findings while building the models and pipelines that power NGRFs platform.
Key Responsibilities
Scientific integrity research: Design and execute large-scale computational analyses of scientific publishing practices across the biomedical literature. Publish findings in peer-reviewed venues (e.g. Scientometrics JASIST Quantitative Science Studies).
NLP model development: Develop and validate models for semantic analysis of scientific text including claim extraction relevance scoring and relationship detection. Build on existing RAG infrastructure and vector search capabilities.
Big data and pipeline engineering: Work with a multi-terabyte literature database and external APIs (PubMed CrossRef OpenAlex) to build scalable data processing and analysis pipelines. Integrate structured and unstructured data sources into reproducible workflows.
Collaboration and outreach: Work with external partners in the research integrity community. Present findings at conferences and contribute to NGRFs visibility in the scientific integrity space.
Required Qualifications
PhD in natural language processing computational linguistics computer science information science or a related field
Publication record in NLP text mining or computational approaches to scientific literature
Strong programming skills in Python with experience in modern ML/NLP frameworks (e.g. PyTorch Hugging Face Transformers)
Experience working with large text corpora and large-scale datasets
Familiarity with vector databases embeddings semantic search and retrieval-augmented generation (RAG)
Experience with knowledge graphs ontology design or graph databases
Demonstrated ability to design and execute independent research projects
Strong written and oral communication skills
Preferred Qualifications
Background in scientometrics bibliometrics or research integrity
Experience with scientific publishing APIs and metadata (DOIs PubMed CrossRef OpenAlex)
Familiarity with claim extraction relation extraction or scientific argument mining
Proficiency with Git GitHub and collaborative software development workflows
Experience with Linux/Unix environments and command-line tools
Familiarity with containerization (Docker) and CI/CD pipelines
Experience with AI-assisted development tools (e.g. Claude Code GitHub Copilot)
Experience deploying NLP models in production or near-production settings
Familiarity with Kubernetes infrastructure-as-code or cloud platforms (AWS)
Experience with SQL databases (PostgreSQL) and data modeling
Track record of interdisciplinary collaboration
Compensation
The prospective salary range for this position is $80000$92000 annually.
Salary Range
$80000 - $92000 USD
Disclaimer:The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills responsibilities duties and/or assignments required. Individuals may be required to perform duties outside of their position job description or responsibilities as needed.
The diversity of NovaGens employees is a tremendous asset. We are firmly committed to providing equal opportunity in all aspects of employment and will not tolerate any illegal discrimination or harassment based on age race gender religion national origin disability marital status covered veteran status sexual orientation status with respect to public assistance and other characteristics protected under state federal or local law and to deter those who aid abet or induce discrimination or coerce others to discriminate.
Accessibility: If you need an accommodation as part of the employment process please contact:
This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidates experience qualifications skills and location.
(ID: 2026-1886)Position OverviewNGRF (NovaGen Research Fund) is a new data science collaboration hub including multiple labs and centers. NGRF is seeking a researcher to join the LLM Lab and contribute to NGRFs scientific discovery platform. This role combines original research in scientific integri...
(ID: 2026-1886)
Position Overview
NGRF (NovaGen Research Fund) is a new data science collaboration hub including multiple labs and centers. NGRF is seeking a researcher to join the LLM Lab and contribute to NGRFs scientific discovery platform. This role combines original research in scientific integrity with hands-on development of NLP and AI capabilities on a large-scale biomedical data platform.
The researcher will work with a multi-terabyte curated literature database develop and evaluate NLP models build retrieval-augmented generation (RAG) pipelines and contribute to a federated knowledge graph that models scientific claims and evidence relationships. The role spans research and engineering publishing peer-reviewed findings while building the models and pipelines that power NGRFs platform.
Key Responsibilities
Scientific integrity research: Design and execute large-scale computational analyses of scientific publishing practices across the biomedical literature. Publish findings in peer-reviewed venues (e.g. Scientometrics JASIST Quantitative Science Studies).
NLP model development: Develop and validate models for semantic analysis of scientific text including claim extraction relevance scoring and relationship detection. Build on existing RAG infrastructure and vector search capabilities.
Big data and pipeline engineering: Work with a multi-terabyte literature database and external APIs (PubMed CrossRef OpenAlex) to build scalable data processing and analysis pipelines. Integrate structured and unstructured data sources into reproducible workflows.
Collaboration and outreach: Work with external partners in the research integrity community. Present findings at conferences and contribute to NGRFs visibility in the scientific integrity space.
Required Qualifications
PhD in natural language processing computational linguistics computer science information science or a related field
Publication record in NLP text mining or computational approaches to scientific literature
Strong programming skills in Python with experience in modern ML/NLP frameworks (e.g. PyTorch Hugging Face Transformers)
Experience working with large text corpora and large-scale datasets
Familiarity with vector databases embeddings semantic search and retrieval-augmented generation (RAG)
Experience with knowledge graphs ontology design or graph databases
Demonstrated ability to design and execute independent research projects
Strong written and oral communication skills
Preferred Qualifications
Background in scientometrics bibliometrics or research integrity
Experience with scientific publishing APIs and metadata (DOIs PubMed CrossRef OpenAlex)
Familiarity with claim extraction relation extraction or scientific argument mining
Proficiency with Git GitHub and collaborative software development workflows
Experience with Linux/Unix environments and command-line tools
Familiarity with containerization (Docker) and CI/CD pipelines
Experience with AI-assisted development tools (e.g. Claude Code GitHub Copilot)
Experience deploying NLP models in production or near-production settings
Familiarity with Kubernetes infrastructure-as-code or cloud platforms (AWS)
Experience with SQL databases (PostgreSQL) and data modeling
Track record of interdisciplinary collaboration
Compensation
The prospective salary range for this position is $80000$92000 annually.
Salary Range
$80000 - $92000 USD
Disclaimer:The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills responsibilities duties and/or assignments required. Individuals may be required to perform duties outside of their position job description or responsibilities as needed.
The diversity of NovaGens employees is a tremendous asset. We are firmly committed to providing equal opportunity in all aspects of employment and will not tolerate any illegal discrimination or harassment based on age race gender religion national origin disability marital status covered veteran status sexual orientation status with respect to public assistance and other characteristics protected under state federal or local law and to deter those who aid abet or induce discrimination or coerce others to discriminate.
Accessibility: If you need an accommodation as part of the employment process please contact:
This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidates experience qualifications skills and location.
View more
View less