We are looking for a Middle/Senior Data Engineer (ETL Python PySpark):
- Tech Level: Middle Senior
- Language Proficiency: Upper-Intermediate
- FTE: 1
- Employment type: Full time
- Candidate Location: Poland
- Working Time Zone: CET. The team is distributed across Poland (CET) Prague India (IST) and the US (EST). CET is preferred for overlap but flexibility is possible given the global setup.
- Start: asap
- Planned Work Duration: 12
Technology Stack: Python SQL AWS PySpark Snowflake (must) Github action (must) Terraform (optional) (Airflow Datadog or Dynatrace are plus)
Customer Description:
Our Client is a leading global management consulting firm.
Numerous enterprise customers across industries rely on our Clients platform and services.
Project Description:
This project is part of a data initiative within the firms secure technology ecosystem.
The focus is on building and maintaining robust data pipelines that collect and process data from multiple enterprise systems such as Jira GitHub AWS ServiceNow and other cloud infrastructure platforms.
The objective is to enable leadership to gain actionable insights aligned with strategic outcomes and to support product and service teams in targeting the right user groups and measuring the effectiveness of various GenAI productivity initiatives.
Project Phase: ongoing
Project Team: 10
Soft Skills:
- Problem-solving style of work
- Ability to clarify requirements with the customer
- Willingness to pair with other engineers when solving complex issues
- Good communication skills
Hard Skills / Need to Have:
- Deep technical skills in AWS Glue (Crawler Data Catalog)
- Hands-on experience with Python
- SQL experience
- Experience with Terraform or other Infrastructure-as-Code (IaC) tools is mandatory
- CI/CD GitHub Actions
- DBT modelling
- Good understanding of AWS services like S3 SNS Secrets Manager Athena and Lambda
- Additionally familiarity with any of the following is highly desirable: Jira GitHub Snowflake.
Hard Skills / Nice to Have (Optional):
- Experience working with Snowflake and understanding of Snowflake architecture including concepts like internal and external tables stages and masking policies.
- Responsibilities and Tasks: Building and maintaining end-to-end ETL pipelines primarily using AWS Glue and PySpark with Snowflake as the target data warehouse:
- New development enhancements defect resolution and production support of ETL development using AWS native services.
- Integration of data sets using AWS services such as Glue and Lambda functions.
- Utilization of AWS SNS to send emails and alerts.
- Authoring ETL processes using Python and PySpark.
- ETL process monitoring using CloudWatch events.
- Connecting with different data sources like S3 and validating data using Athena.
Ready to Join
We look forward to receiving your application and welcoming you to our team!