At IQVIA we are continuously expanding the boundaries of whats possible in clinical development through advanced analytics cutting-edge technology and deep scientific expertise. Within our Research & Development Solutions (RDS) organization we are enhancing our services with agentic systemsautonomous AI agents that can reason plan act and learnto further streamline clinical trial workflows and accelerate the delivery of new therapies. By embedding these capabilities into our service offerings for our customers and the clinical sites that we engage with to run clinical trials we not only strengthen our leadership in AI-driven clinical research but also bring life-changing treatments to patients faster and more efficiently.
We are seeking an experienced Senior Data Engineer to join our innovative AI team. In this role you will lead the development and optimization of data infrastructure supporting our cutting-edge Agentic AI initiatives. You will collaborate with ML engineers AI scientists and product managers to architect implement and maintain robust data pipelines that power autonomous AI agents. As a senior data engineer of the R&DS AI Innovation Program you will help shape our data strategy while ensuring our data solutions scale effectively to meet the demanding requirements of next-generation AI systems.
Key Responsibilities
Mandatory
- Design develop and maintain scalable data pipelines and ETL processes to support AI research and development.
- Collaborate with AI scientists and engineers to understand data requirements and ensure data availability and quality.
- Implement data governance and security measures to protect sensitive information.
- Monitor and troubleshoot data pipeline issues to ensure smooth operation.
- Stay updated with the latest advancements in data engineering and AI technologies.
- Design and implement scalable resilient data architectures specifically tailored for AI agent training fine-tuning and inference workflows.
- Develop and maintain high-performance data pipelines utilizing modern orchestration frameworks to support real-time agent interactions and feedback loops.
Preferred
- Create specialized data storage and retrieval systems for efficient vector embeddings knowledge graphs and symbolic reasoning components used by AI agents.
- Implement robust data validation monitoring and governance frameworks to ensure high-quality training data for AI systems while maintaining compliance with privacy regulations.
- Continuously monitor and improve data system performance focusing on reducing latency for agent decision-making processes.
Qualifications
Mandatory
- Education: Bachelors or Masters degree in Computer Science Data Engineering or related field; advanced degree preferred.
- Experience: 5 years of professional experience in data engineering with at least 2 years focused on ML/AI data infrastructure.
- Programming & Technologies:
- Advanced proficiency in Python and Scala; experience with Rust Go Java or Julia valued.
- Expert-level knowledge of SQL and NoSQL databases.
- Hands-on experience with vector databases (e.g. Pinecone Weaviate Milvus).
- Proficiency with modern data orchestration platforms (e.g. Airflow 2.x).
- Cloud & Infrastructure:
- Extensive experience with at least one major cloud platform (AWS Azure GCP).
- Expertise in containerization and orchestration (Docker Kubernetes).
- Experience with Infrastructure as Code (e.g. Terraform).
- Data Processing:
- Experience with distributed computing frameworks (Spark Dask Ray).
- Proficiency with streaming technologies (e.g. Kafka Flink).
- Knowledge of modern data lakehouse architectures.
Preferred
- Certification in cloud platforms big data technologies engineering or ML operations.
- Experience in collaborations with ML engineers on implementing CI/CD pipelines for data processing and model deployment ensuring seamless integration between data infrastructure and AI development workflows.
- Working knowledge of ML frameworks (e.g. PyTorch TensorFlow).
- Experience with feature stores and experiment tracking platforms.
- Understanding of LLM fine-tuning data requirements and processing. Experience developing data systems for autonomous AI agents or other agentic AI applications.
- Background in prompt engineering or retrieval-augmented generation systems.
- Experience with semantic caching and efficient storage/retrieval of AI-generated artifacts.
- Familiarity with LLM evaluation metrics and benchmarking frameworks.
- Expertise in building hybrid data architectures combining traditional databases with vector stores.
- Experience with RAG (Retrieval-Augmented Generation) systems and related data pipelines.
- Knowledge of reinforcement learning from human feedback (RLHF) data workflows.
- Experience in mentoring junior engineers establish best practices and contribute to architectural decisions across the organizations data infrastructure.
This role can be performed fully remotely. However if you prefer working from an office environment we are happy to accommodate that as well.
IQVIA is a leading global provider of clinical research services commercial insights and healthcare intelligence to the life sciences and healthcare industries. We create intelligent connections to accelerate the development and commercialization of innovative medical treatments to help improve patient outcomes and population health worldwide. Learn more at
Required Experience:
Senior IC