DATA ENGINEER (Data Science & Big Data Analytics)

Eurecat

Job Location:

Barcelona - Spain

Monthly Salary: Not Disclosed

Posted on: 8 hours ago

Vacancies: 1 Vacancy

Job Summary

EURECAT

Eurecat is the second Research & Technology Organisation in Spain and one of the largest applied research and technology transfer organisation in Southern

Europe. It brings together the experience of more than 800 professionals who generate an annual turnover of 69 million euros and provides services to more than 2000 companies. Eurecat integrates advanced digital capabilities and experience in biotechnology industry and sustainability and collaborates with industry in RDI activities and projects offering advanced scientific and technological services and specialized knowledge to respond effectively to the technological needs of very different business sectors accelerating innovation reducing both risks and spendings on scientific and technological infrastructures. The technology center participates in more than 200 large national and international consortium projects of high strategic R&I has 230 patents and 10 spin-offs. Eurecat has eleven centers in Catalonia and presence in Madrid Malaga and Chile.

Job description

You will join the Big Data & Data Science unit a diverse team covering areas as varied as Computational Social Science Cognitive Neuroscience and Trustworthy AI. We are looking for an intelligent and curious data engineer to help us translate applied research into tangible products and prototypesworking on real European research projectsalongside researchers software engineers and project managers.

FUNCTIONS AND RESPONSIBILITIES OF THE JOB:

Design build and maintain data pipelines (batch and streaming) that ingest data from heterogeneous sources into data lakes and warehouses including metadata and lineage tracking.
Contribute to the development of federated query and discovery systems over distributed datasets () working with engines such as Trino and integrating query optimizers compliant with privacy requirements.
Contribute to the deployment of European data spaces (DeployEMDS) using standard building blocks from IDSA Gaia-X and FIWARE including data catalogues brokers and connectors.
Build and maintain orchestration workflows using Airflow or Dagster following software engineering best practices (tests code review CI/CD).
Package and deploy services using Docker and Docker Compose or similar
Support Machine Learning projects with data storage serving and versioning infrastructure (object storage SQL/NoSQL databases feature stores).
Collaborate on multi-cloud and on-premise deployments (e.g. Hetzner Azure bare metal) and contribute to infrastructure-as-code practices.
Support the preparation of technical sections in EU-funded project proposals (Horizon Europe and similar) and contribute to scientific dissemination (papers prototypes demos).

Requirements

Studies

MSc in Computer Science Data Engineering Mathematics Physics or related technical field. A PhD or specialised Masters will be highly valued.

Experience

At least 2 years of professional experience as a Data Engineer or in a closely related role

Technical skills

Strong Python proficiency including modern tooling for clean code (type hints linters/formatters such as Ruff testing with pytest).
Solid SQL skills and experience with relational databases (PostgreSQL MySQL)
Experience with at least one NoSQL or document database (Redis Elasticsearch or similar)
Experience building ETL/ELT data pipelines (Airflow Dagster or similar)
Working knowledge of object storage (S3 MinIO) and common serialization formats (Parquet JSONL Avro BSON).
Comfort on Linux and with the command line
Docker and Docker Compose for packaging and local development
Git and CI/CD workflows (GitHub Actions GitLab CI or similar)
Understanding of batch vs. streaming paradigms and event-driven architectures
Understanding of the difference between Data Lake and Data Warehouse architectures and when to use each.

Languages

Excellent written and spoken English
Knowledge of Catalan and/or Spanish is a plus

Nice-to-have

Experience with distributed query engines (Trino Presto Dremio) and the concept of federated queries over heterogeneous data sources.
Familiarity with European data spaces initiatives: IDSA Gaia-X FIWARE DSSC Eclipse Dataspace Components; data catalogues (CKAN) brokers and connectors.
Big Data ecosystem: Apache Spark Flink Kafka RabbitMQ Hadoop
Kubernetes and Helm for production deployments
Infrastructure as Code with Terraform Ansible or similar
Observability stacks: OpenTelemetry Prometheus Grafana Loki or equivalents
Experience with cloud providers (Azure AWS GCP Hetzner): serverless functions managed storage IAM.
Graph databases (Neo4j) or time-series databases
Machine Learning fundamentals and familiarity with ML lifecycle tooling (MLflow feature stores model versioning).
Concurrency and backend knowledge: async programming multithreading actor model message-driven systems.
Additional programming languages: Java Scala Go or Rust
Participation in EU-funded research projects (Horizon Europe Digital Europe) or scientific publications / conference presentations.
Relevant certifications (cloud providers Kubernetes CKA/CKAD data platforms)

WHAT CAN EURECAT OFFER YOU

Permanent contract.
Hybrid work (home office/ work in the office).
Flexible Schedule.
Shorter workday on Friday and Summer Schedule.
Flexible remuneration package (health insurance transport lunch studies - training and kindergarten).
Eurecat employees can join the Eurecat Academy courses.
Language courses (English Catalan and Spanish).

Required Experience:

EURECATEurecat is the second Research & Technology Organisation in Spain and one of the largest applied research and technology transfer organisation in SouthernEurope. It brings together the experience of more than 800 professionals who generate an annual turnover of 69 million euros and provides s...

EURECAT

Eurecat is the second Research & Technology Organisation in Spain and one of the largest applied research and technology transfer organisation in Southern

Job description

FUNCTIONS AND RESPONSIBILITIES OF THE JOB:

Design build and maintain data pipelines (batch and streaming) that ingest data from heterogeneous sources into data lakes and warehouses including metadata and lineage tracking.
Contribute to the development of federated query and discovery systems over distributed datasets () working with engines such as Trino and integrating query optimizers compliant with privacy requirements.
Contribute to the deployment of European data spaces (DeployEMDS) using standard building blocks from IDSA Gaia-X and FIWARE including data catalogues brokers and connectors.
Build and maintain orchestration workflows using Airflow or Dagster following software engineering best practices (tests code review CI/CD).
Package and deploy services using Docker and Docker Compose or similar
Support Machine Learning projects with data storage serving and versioning infrastructure (object storage SQL/NoSQL databases feature stores).
Collaborate on multi-cloud and on-premise deployments (e.g. Hetzner Azure bare metal) and contribute to infrastructure-as-code practices.
Support the preparation of technical sections in EU-funded project proposals (Horizon Europe and similar) and contribute to scientific dissemination (papers prototypes demos).

Requirements

Studies

MSc in Computer Science Data Engineering Mathematics Physics or related technical field. A PhD or specialised Masters will be highly valued.

Experience

At least 2 years of professional experience as a Data Engineer or in a closely related role

Technical skills

Strong Python proficiency including modern tooling for clean code (type hints linters/formatters such as Ruff testing with pytest).
Solid SQL skills and experience with relational databases (PostgreSQL MySQL)
Experience with at least one NoSQL or document database (Redis Elasticsearch or similar)
Experience building ETL/ELT data pipelines (Airflow Dagster or similar)
Working knowledge of object storage (S3 MinIO) and common serialization formats (Parquet JSONL Avro BSON).
Comfort on Linux and with the command line
Docker and Docker Compose for packaging and local development
Git and CI/CD workflows (GitHub Actions GitLab CI or similar)
Understanding of batch vs. streaming paradigms and event-driven architectures
Understanding of the difference between Data Lake and Data Warehouse architectures and when to use each.

Languages

Excellent written and spoken English
Knowledge of Catalan and/or Spanish is a plus

Nice-to-have

Experience with distributed query engines (Trino Presto Dremio) and the concept of federated queries over heterogeneous data sources.
Familiarity with European data spaces initiatives: IDSA Gaia-X FIWARE DSSC Eclipse Dataspace Components; data catalogues (CKAN) brokers and connectors.
Big Data ecosystem: Apache Spark Flink Kafka RabbitMQ Hadoop
Kubernetes and Helm for production deployments
Infrastructure as Code with Terraform Ansible or similar
Observability stacks: OpenTelemetry Prometheus Grafana Loki or equivalents
Experience with cloud providers (Azure AWS GCP Hetzner): serverless functions managed storage IAM.
Graph databases (Neo4j) or time-series databases
Machine Learning fundamentals and familiarity with ML lifecycle tooling (MLflow feature stores model versioning).
Concurrency and backend knowledge: async programming multithreading actor model message-driven systems.
Additional programming languages: Java Scala Go or Rust
Participation in EU-funded research projects (Horizon Europe Digital Europe) or scientific publications / conference presentations.
Relevant certifications (cloud providers Kubernetes CKA/CKAD data platforms)

WHAT CAN EURECAT OFFER YOU

Permanent contract.
Hybrid work (home office/ work in the office).
Flexible Schedule.
Shorter workday on Friday and Summer Schedule.
Flexible remuneration package (health insurance transport lunch studies - training and kindergarten).
Eurecat employees can join the Eurecat Academy courses.
Language courses (English Catalan and Spanish).

Required Experience:

Apply Now

About Company

Eurecat

The Functional Printing & Embedded Devices Unit develops innovative solutions based on functional materials and devices. This Unit has the knowledge and experience to contribute from research concepts to prototyping and manufacturing. The Unit can cover the entire TRL chain, from the. ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

DATA ENGINEER (Data Science & Big Data Analytics)

Barcelona - Spain

Job Summary

EURECAT

Job description

Requirements

EURECAT

Job description

Requirements

About Company

Related Jobs