Senior Data Engineer Senior Data Engineer Data & Infrastructure

AstraZeneca

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 6 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Title: Senior Data Engineer - Data Pipelines

Introduction to role:

Are you ready to architect FAIR data platforms that accelerate discovery and turn complex science into deployable insights Do you want your engineering decisions to remove data friction power analytics and help deliver life-changing medicines faster

In this role you will design and operate the data foundations our scientists and analysts rely on to explore disease biology generate evidence and make bold decisions. You will work across high-performance computing and cloud environments to create secure scalable pathways for data to move from experiments to models to actionable results.

You will join a collaborative curious team that fuses data and technology with cutting-edge science. By building canonical models trusted pipelines and resilient infrastructure you will help reduce time-to-insight improve reproducibility and enable the next wave of breakthroughs.

Accountabilities:Data Platform Architecture: Design and implement robust secure and scalable data platforms and services that enable discovery access and reuse (FAIR) and remove barriers to scientific analysis.
Modeling and Warehousing: Define canonical data models and dimensional schemas; build lakehouse/warehouse layers that optimize storage and query performance to speed up evidence generation.
Data Integration: Create reliable ingestion frameworks for structured and unstructured data; standardize metadata lineage and cataloging to make data findable and trustworthy.
Governance and Quality: Establish and enforce standards for data quality access control retention and compliance; implement monitoring and observability for proactive issue detection and continuous improvement.
Infrastructure Engineering: Operate solutions across Unix/Linux HPC and AWS cloud environments; engineer for reliability cost efficiency scalability and sustainable performance.
Collaboration and Stakeholder Engagement: Translate scientific and business requirements into clear architectural designs; partner with CPSS stakeholders R&D IT and DS&AI to co-create solutions that deliver measurable value.
Engineering Excellence: Apply version control CI/CD automated testing design patterns and code review to ensure maintainability resilience and a high bar for software craftsmanship.
Enablement and Information Exchange: Produce documentation reusable components and mentorship that uplift data engineering practices across teams; mentor peers and champion platform adoption.

Essential Skills/Experience:Data platform architecture: Design and implement robust secure and scalable data platforms and services that enable discovery access and reuse (FAIR).
Modeling and warehousing: Develop canonical data models dimensional schemas and lakehouse/warehouse layers; optimize storage and query performance.
Data integration: Build reliable ingestion frameworks for structured and unstructured data; standardize metadata lineage and cataloging.
Governance and quality: Establish standards for data quality access control retention and compliance; implement monitoring and observability.
Infrastructure engineering: Operate solutions across Unix/Linux HPC and cloud environments (AWS preferred); ensure reliability cost efficiency and scalability.
Collaboration: Translate scientific and business requirements into architectural builds; partner with CPSS collaborators R&D IT and DS&AI to co-create solutions.
Engineering excellence: Apply version control CI/CD automated testing design patterns and code review to ensure maintainability and resilience.
Enablement: Produce documentation reusable components and guidance to uplift data engineering practices across teams.

Desirable Skills/Experience:

Hands-on expertise with Python or Scala and distributed data processing frameworks (Spark PySpark); experience with SQL at scale.
Experience with modern lakehouse and warehouse technologies (Delta Lake Apache Iceberg or Hudi Redshift Snowflake Athena BigQuery) and data modeling tools and practices (Dimensional Data Vault).
Familiarity with orchestration and data workflow tools (Airflow Argo Dagster) event streaming (Kafka Kinesis) and metadata/governance platforms (Collibra Alation AWS Glue).
Cloud engineering skills in AWS services relevant to data (S3 EMR Glue Lambda Step Functions ECS/EKS) and infrastructure-as-code (Terraform CloudFormation).
Operating experience in Unix/Linux HPC environments job schedulers (SLURM) containerization and secure data access patterns for scientific workloads.
Observability and reliability practices (Prometheus Grafana CloudWatch) cost optimization and performance tuning for large-scale analytics.
Strong communication skills to align diverse collaborators translate domain concepts into technical builds and drive adoption through documentation and enablement.
Relevant certifications or demonstrated leadership in data platform architecture governance or cloud engineering.

When we put unexpected teams in the same room we unleash bold thinking with the power to
inspire life-changing -person working gives us the platform we need to connect work at pace and challenge
perceptions. Thats why we work on average a minimum of three days per week from the office. But that
doesnt mean were not flexible. We balance the expectation of being in the office while respecting individual
flexibility. Join us in our unique and ambitious world.

Why AstraZeneca:
At AstraZeneca you will engineer where impact is immediate and visibleyour pipelines will shape evidence accelerate decisions and help bring new treatments to people sooner. We bring experts from different fields together to solve hard problems quickly backed by modern platforms across HPC and public cloud so your work runs at scale. Leaders remove barriers teams share knowledge openly and we value kindness alongside ambition giving you room to innovate while staying grounded in real patient outcomes.

Call to Action:
If you are ready to architect the data flows that move science into the clinic send us your CV and tell us about the toughest pipeline you have built and scaled.

Date Posted

24-Dec-2025

Closing Date

05-Jan-2026

AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds with as wide a range of perspectives as possible and harnessing industry-leading skills. We believe that the more inclusive we are the better our work will be. We welcome and consider applications to join our team from all qualified candidates regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment) as well as work authorization and employment eligibility verification requirements.


Required Experience:

Senior IC

Job Title: Senior Data Engineer - Data PipelinesIntroduction to role:Are you ready to architect FAIR data platforms that accelerate discovery and turn complex science into deployable insights Do you want your engineering decisions to remove data friction power analytics and help deliver life-changin...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, ... View more

View Profile View Profile