Staff Data Engineer
Cambridge, MA - USA
Job Summary
Iterative Health is a healthcare technology and services company powering the acceleration of clinical research to transform patient outcomes. The Iterative Health Site Network is a premier network of 70 clinical research sites across US and Europe accelerating the path to market for gastrointestinal (GI) and hepatology novel therapies. Our focus is on driving the success and growth of our partner sites by empowering them with tech-enabled services. By combining deep expertise in clinical trials with cutting-edge AI we empower research teams and study sponsors to expand and expedite access to novel therapeutics for patients in need.
About the Role
Accelerating clinical research is one of the defining challenges in healthcare. Promising therapies exist that patients cant access because the operational infrastructure to run clinical trials efficiently doesnt exist yet. Were building it. That means designing technology systems that bring order to a fragmented landscape of clinical data sources automating the operational work that slows trials down and turning real-world clinical data into a foundation for predictive intelligence.
Were building a uniquely valuable data asset: real-world patient and research data flowing across 80 trial sites spanning dozens of EHRs and clinical systems focused on patient populations that are chronically underserved by existing clinical research infrastructure. Your job is to build the pipelines data models and AI infrastructure that make this asset real from ingestion and normalization through to the systems that power predictions on top of it. Youll own data quality and observability as foundational engineering problems. Youll also have a direct hand in shaping how this data drives our AI strategy what we model what we predict and what becomes possible.
This is an opportunity for someone who wants to be part of a small fast-moving engineering team at a formative stage. Youll shape what gets built how decisions get made and what the team becomes.
Responsibilities
- Own the data layer and architecture: the models schemas and infrastructure decisions that everything downstream depends on
- Build and operate the pipelines and transformations that move data from ingestion through normalization enrichment and into the formats that support analytics ML training and production model serving
- Own data quality and observability: build the systems that make data issues visible and correctable before they compound
- Partner with ML and engineering teams to identify whats modelable define training data requirements and build the data foundations for new predictive capabilities
- Define how clinical and operational data is governed across the system
- Evaluate and select the tools and technologies that make up the data stack with a clear point of view on build vs. buy
- Help shape the engineering culture of a small growing team: how technical decisions get made how problems get debated what rigor looks like in practice
What Were Looking For
Required Qualifications
- 10 years of experience in data engineering or related roles with significant time spent building data systems
- Experience with healthcare data strongly preferred (HL7 FHIR claims EHR extracts) or other complex regulated data domains
- Deep experience modeling and integrating data from multiple heterogeneous sources with inconsistent schemas and quality
- Experience applying AI and LLMs to data engineering problems: extraction normalization classification entity resolution
- Strong understanding of how data infrastructure supports ML workflows from feature engineering to training data pipelines to model serving
- Fluent in SQL and at least one modern programming language (Python Java Scala Go) with experience across modern data infrastructure - distributed processing streaming cloud-native storage orchestration and transformation frameworks
- Have built data systems from early stages making foundational decisions with incomplete information
- Naturally raise the quality of the engineering around you through code review design guidance and honest technical conversation
Preferred Qualifications
- Experience building data infrastructure that directly supports ML model training and evaluation
- Familiarity with clinical trial operations EDC systems or life sciences data
- SOC 2 HIPAA or similar compliance experience baked into engineering practice
- A track record of building or improving data systems that others had given up on making reliable
New York pay range
$200000 - $325000 USD
At Iterative Health were actively working towards creating an environment that is representative of the diversity of patients our technology serves. We are focused on building an equitable and inclusive culture and by extension hiring process. If you require any accommodations to make the application process or interviewing experience more accessible to you please contact
Required Experience:
Staff IC
About Company
Powering Exceptional GI Care. We help sites and sponsors accelerate trials, improve diagnostics, and expand patient access.