This is a remote position.
We are seeking a AI Data Engineer (Python and ETL) to join our team.
Responsibilities:
Data Architecture & Schema Design: Design implement and manage robust data schemas and pipelines tailored for AI workflows across systems and integrations including the core application model training finetuning and evaluation.
Database Design & Data Modeling: Design and maintain scalable efficient and AIoptimized data models and database architectures (relational and NoSQL) to support data ingestion transformation and retrieval for generative AI and application needs.
Dataset Curation: Lead the creation organization and versioning of datasets used in model development (structured and unstructured) including data labeling and augmentation workflows.
Metadata & Lineage: Develop and maintain data and metadata tracking systems for datasets and AI models enabling traceability reproducibility and responsible AI practices.
Data Governance & Security: Enforce data privacy compliance (e.g. GDPR HIPAA) and security best practices throughout the data lifecycle.
Crossfunctional Collaboration: Work closely with data scientists to understand data needs for finetuning and experimentation; partner with product teams to ensure data alignment with application requirements.
Quality & Validation: Implement automated validation lineage tracking and quality assurance mechanisms to ensure data reliability at scale.
Tooling & Automation: Build or integrate tools to support data versioning synthetic data generation and performance monitoring.
Documentation & Standards: Define and promote best practices for dataset documentation data contracts and data lineage to ensure consistency and usability across teams.
Requirements
Bachelor s or Master s degree in Computer Science Data Engineering Information Science or a related field.
Proficiency in Python SQL and ETL.
Deep understanding of structured and unstructured data handling.
Strong grasp of data modeling metadata systems and schema evolution.
Experience implementing data governance security and privacy controls in regulated environments.
Familiarity with tools like DVC MLflow Hugging Face Datasets or custom dataset/metadata management systems.
Benefits
- Work Location: Remote
- 5 days working
At least 5 years of experience with data science Proven experience with analytics Outstanding analytical skills and the ability to take a practical approach to solving problems in real-world settings Ability to share your skills with other team members and contribute to learning as a group Able to manage timelines, quality, and delivery Effective collaborator who values cross-disciplinary delivery Experience in healthcare is NOT required Bonus points for experience with: Health Care Data Tableau Pandas Matplotlib