Research Scientist I, SMaHT
Cambridge, MA - USA
Job Summary
General information
Description & Requirements
The Somatic Mosaicism across Human Tissues (SMaHT) Network was established by the NIH to catalog somatic genetic variation across human tissues and discover its patterns causes and consequences. This effort includes all classes of somatic mutation: single nucleotide variants short insertions and deletions structural variants and other large chromosomal aberrations. The somatic mutation catalog will enable downstream analyses such as rates and burdens of mutations across tissues mutational signature analysis driver mutations and clonal expansions and lineage tracing.
The Broad Institute is one of five Genome Characterization Centers (GCCs) tasked with delivering the sequencing data underpinning this somatic mutation catalog. As part of this effort we are looking for a highly motivated and talented individual with a computational background to join these efforts and to lead data curation and analyses for this ambitious project. Some of our recent work includes Coorens et al. Nature 2025.
The successful candidate will join an interdisciplinary team working with an unprecedented set of multimodality data from a wide range of human tissues and donors including extensive deep short- and long-read genomics transcriptomics epigenomics and duplex sequencing data. The scope of this project provides unique opportunities for developing novel analytical methods for data QC integration detection of somatic mutations multi-tissue analyses and integration with transcriptomic data.
Responsibilities will include overseeing the implementation of experimental work plans pipelines for data processing organization data submission timelines and analysis and contributing to budgetary and operational addition the individual is expected to be able to clearly communicate scientific details results and strategic considerations to others within the team and the SMaHT network at large. This role will require strategic coordination of multiple groups at the Broad Institute and within the SMaHT Network. This individual will serve as a key contact for project leaders collaborators of the project (specifically the Data Analysis Center) and other staff.
PRINCIPAL DUTIES AND RESPONSIBILITIES
- Design and execute data QC and analysis strategies involving multimodal human tissue datasets and specifically lead whole-genome short and long-read DNA sequencing data RNA sequencing and somatic mutation analyses. Prior experience working with long-read genomics and transcriptomics data is required.
- Apply and develop state-of-the-art computational tools and pipelines to a) assess data quality b) integrate diverse data types and metadata and c) detect somatic mutations and subsequent downstream analyses.
- Collaborate with and provide analytical support for internal technology development efforts for the application of novel strategies to detect somatic mutations.
- Together with others develop new methodologies and evaluate new methods for integrative analysis of these genomic data types.
- Present ideas and results to the multi-disciplinary members of the SMaHT Network. Prepare written reports and presentations for internal use as well as presentations at SMaHT Network and other conferences.
QUALIFICATIONS
- PhD in Genomics Bioinformatics Computational Biology Computer Science Statistics Math or a related quantitative field is required with 2yrs of industry experience
- Experience with computational analysis algorithm development and statistics is expected.
- Proven track record of leading complex data curation or analysis projects ideally within large-scale consortia is a strong plus.
- Deep Sequencing Expertise: Extensive experience with analyzing high-throughput biological data specifically long-read genomic (PacBio/Oxford Nanopore) and transcriptomic data (RNAseq) is required.
- Pipeline Development: Proficiency in developing and maintaining reproducible computational pipelines using languages such as Python R or C and workflow managers like WDL Nextflow or Snakemake.
- Cloud Computing: Experience working in cloud-based environments (e.g. Google Cloud Platform/Terra AWS) to manage and process petabyte-scale datasets.
- Somatic Mutation Analysis: Strong background in detecting and interpreting single nucleotide variants (SNVs) indels and structural variants (SVs) in human samples.
- Multimodal Data Integration: Demonstrated ability to integrate and analyze diverse data types including transcriptomics (RNA-seq) epigenomics and duplex sequencing is a strong plus.
- Strategic Coordination: Ability to manage timelines and deliverables across multiple interdisciplinary groups both within the Broad Institute and external collaborators within the network (e.g. the Data Analysis Center).
- Scientific Communication: Excellent verbal and written communication skills with the ability to present complex technical results to both specialist and generalist audiences.
- Adaptability: A high degree of motivation to work in a fast-paced evolving research environment on a high-stakes 5-year NIH-funded initiative.
Required Experience:
IC
About Company
Broad Institute is a multidisciplinary community of researchers on a mission to improve human health.