Data Engineering & Pipeline Development
Design build and maintain scalable and reliable data pipelines (batch and streaming).
Ingest and integrate data from multiple sources (SQL/NoSQL databases APIs files cloud services).
Develop and maintain efficient ETL/ELT processes and data workflows.
Ensure data quality integrity and availability across the data lifecycle.
Optimize data storage and processing for performance and cost efficiency.
Data Platform & Architecture
Design and maintain modern data architectures (Data Lake Data Warehouse Lakehouse).
Implement scalable data models to support analytics and operational use cases.
Manage orchestration scheduling and monitoring of data pipelines.
Maintain and improve cloud-based data infrastructure.
Apply data governance practices version control and technical documentation standards.
AI/ML & Data Infrastructure Support
Build and maintain data pipelines that support AI and Machine Learning use cases.
Prepare curated datasets and feature-ready data for Data Science teams.
Implement ingestion and processing pipelines for LLM-based applications.
Manage embedding pipelines and integrations with vector databases.
Support RAG architectures from a data engineering and infrastructure perspective.
Monitoring Reliability & Performance
Implement monitoring alerting and observability for data pipelines and workflows.
Detect and resolve data quality issues pipeline failures and performance bottlenecks.
Optimize queries data models and processing jobs to improve scalability and reliability.
Qualifications :
Requirements
Strong experience with Python focused on data processing and pipeline development.
Advanced SQL skills and solid understanding of data modeling concepts.
Hands-on experience with:
ETL/ELT frameworks
Workflow orchestration tools (e.g. Airflow Prefect Dagster or similar)
Distributed data processing frameworks (e.g. Apache Spark or similar)
Experience working with cloud platforms especially:
Microsoft Azure (e.g. Data Factory Synapse Fabric Azure AI Foundry)
Google Cloud (e.g. BigQuery Dataflow Cloud Composer or similar services)
Experience supporting LLM-based data infrastructure including:
Vector databases
Embedding pipelines
Integration with frameworks such as LangChain (from a data engineering perspective)
Familiarity with BI tools and supporting analytical data models (Power BI Tableau Qlik).
Strong communication skills and ability to work in cross-functional teams.
High level of English and Polish is a must.
Additional Information :
What do we offer you
If you are passionate about data development & tech we want to meet you!
Remote Work :
No
Employment Type :
Full-time
Talan is an international consulting and technology expertise group that accelerates the transformation of its clients by leveraging innovation, technology, and data. For over 20 years, Talan has been advising and supporting businesses and public institutions in the implementation of ... View more