AI Data Engineer
Philadelphia, PA - USA
Job Summary
Mission
Join the Data Science team as an AI Data Engineer responsible for building the data foundations that make enterprise AI products accurate explainable and scalable. This role will design and implement Snowflake and dbt pipelines from raw source data to curated gold-layer datasets create semantic models that LLM tools can use reliably and partner with data science product and engineering teams to convert data dictionaries and business definitions into AI-ready data products. The ideal candidate is a strong data engineer with deep Snowflake/dbt experience and a practical understanding of how semantic layers ER relationships denormalized models and metadata quality influence LLM and agent performance.
Position Overview
- Snowflake and dbt engineering: Design build optimize and operate Snowflake pipelines and dbt models across raw curated and gold-layer datasets.
- AI-ready semantic modeling: Create semantic models relationships metrics dimensions and curated views that allow LLM tools and agents to answer questions accurately.
- Data dictionary-driven delivery: Translate team-defined data dictionaries business definitions and source mappings into tested governed and reusable data products.
- Agent consumption focus: Design datasets for AI agents natural-language analytics Snowflake Cortex Analyst and other LLM-powered tools.
- Enterprise data modeling: Balance normalized source models ER relationships dimensional models denormalized consumption layers and semantic-layer needs.
Key Responsibilities
Snowflake dbt and Data Pipeline Development
- Build reliable data pipelines from raw source data through curated silver layers and business-ready gold layers using Snowflake and dbt.
- Develop modular dbt models tests documentation exposures and lineage-friendly transformation patterns.
- Implement incremental processing snapshots audit columns reconciliation data quality checks and restartable pipeline patterns.
- Optimize Snowflake SQL and dbt workloads for performance scalability cost and maintainability.
- Work with orchestration and DevOps/SRE teams to support CI/CD environment promotion pipeline monitoring and operational runbooks.
Semantic Models and AI-Ready Data Products
- Create Snowflake semantic models and curated views that support accurate natural-language querying through Snowflake Cortex Analyst and related LLM tools.
- Translate approved data dictionaries into semantic model dimensions facts metrics synonyms descriptions relationships and business rules.
- Design ER relationships and join paths that are explicit accurate and easy for semantic-layer tools and AI agents to use.
- Create denormalized or consumption-optimized models where appropriate to reduce ambiguity and improve LLM answer quality.
- Partner with AI developers to understand tool schema needs agent workflows and how data model design affects LLM tool performance.
Data Modeling Integration and Consolidation
- Design logical and physical models that support enterprise data consolidation analytical reporting AI workflows and business operations.
- Work across source systems files APIs cloud storage operational systems and analytical platforms to integrate data into Snowflake.
- Create reusable patterns for source-to-target mapping schema evolution master/reference data alignment and data product publishing.
- Collaborate with business and technical stakeholders to validate data definitions grain relationships hierarchies and measures.
- Support data consolidation across Integrichain by rationalizing overlapping datasets and aligning enterprise definitions.
Snowflake Cortex and AI Platform Enablement
- Understand Snowflake Cortex capabilities including Cortex Analyst Cortex Complete semantic views/models and metadata-driven AI workflows.
- Prepare data models and semantic layers for accurate LLM usage including clear naming descriptions relationships metrics and governance metadata.
- Support AI Explorer and similar applications by ensuring curated datasets are reliable performant explainable and governed.
- Partner with AI and application teams to troubleshoot semantic model issues poor AI answers ambiguous joins missing metadata or incorrect measures.
- Contribute to standards for AI-ready data design semantic model review data dictionary alignment and LLM-friendly data modeling.
Qualifications :
- 6 years of experience in data engineering analytics engineering database engineering or data platform development in production environments.
- Strong hands-on experience with Snowflake including SQL development performance tuning security-aware design cost optimization and large-volume processing.
- Strong hands-on experience with dbt or comparable ELT tooling including models tests documentation lineage and environment promotion.
- Experience building raw-to-curated-to-gold data pipelines and business-ready datasets.
- Strong SQL and Snowflake development skills including complex transformations views stored procedures/Snowflake Scripting and query optimization.
- Experience creating semantic layers semantic models metrics dimensions relationships and curated analytical views.
- Good understanding of ER modeling dimensional modeling denormalized consumption models and data grain management.
- Experience translating data dictionaries and business definitions into physical models dbt models and semantic-layer definitions.
- Understanding of Snowflake Cortex capabilities such as Cortex Analyst Cortex Complete and semantic-model-driven natural-language querying.
- Ability to partner with data science product engineering and business teams to deliver AI-ready data products.
Preferred Experience
- Experience in life sciences healthcare pharma commercialization MDM patient data channel data or commercial data platforms.
- Experience with Snowflake semantic views Cortex Analyst Cortex Search or other AI/LLM data platform capabilities.
- Experience with data quality frameworks metadata management data observability and lineage tooling.
- Experience with orchestration tools such as dbt Cloud jobs Airflow Dagster cloud-native schedulers or similar platforms.
- Experience with Python for data automation metadata processing testing or API integrations.
- Experience designing governed data products for BI AI/ML natural-language analytics or agentic applications.
- Snowflake SnowPro dbt certification or equivalent data engineering credentials.
Additional Information :
What does IntegriChain have to offer
- Mission driven: Work with the purpose of helping to improve patients lives!
- Excellent and affordable medical benefits non-medical perks including Student Loan Reimbursement Flexible Paid Time Off and Paid Parental Leave
- 401(k) Plan with a Company Match to prepare for your future
- Robust Learning & Development opportunities including over 700 development courses free to all employees
#LI-ZG1
IntegriChain is committed to equal treatment and opportunity in all aspects of recruitment selection and employment without regard to race color religion national origin ethnicity age sex marital status physical or mental disability gender identity sexual orientation veteran or military status or any other category protected under the law. IntegriChain is an equal opportunity employer; committed to creating a community of inclusion and an environment free from discrimination harassment and retaliation.
Our policy on visa sponsorship for US based positions: Applicants for employment in the US must have valid work authorization that does not now and/or will not in the future require sponsorship of a visa for employment authorization in the US by IntegriChain.
Remote Work :
Yes
Employment Type :
Full-time
About Company
IntegriChain is the data and application backbone for market access departments of Life Sciences manufacturers. We deliver the data, the applications, and the business process infrastructure for patient access and therapy commercialization. More than 250 manufacturers rely on our ICyt ... View more