Data Engineer (Snowflake, ClickHouse)

Not Interested
Bookmark
Report This Job

profile Job Location:

Limassol - Cyprus

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

High-load in-house platform handling multiple third-party integrations data collection and comprehensive reporting systems. The system processes large volumes of real-time data and requires high availability with performance optimization.

Tasks

  • Design and maintain scalable ETL/ELT pipelines
  • Handle CDC ingestion from Amazon RDS (MySQL) via Debezium Kafka
  • Load and optimize high-throughput data flows into ClickHouse and Snowflake
  • Tune ingestion in ClickHouse: partitioning TTLs ORDER BY keys
  • Use Parquet/ORC and partitioning for efficient storage in S3
  • Integrate datasets into Power BI
  • Implement monitoring and alerting for data pipelines
  • Use Git CI/CD and manage infrastructure with Terraform or AWS CDK

Requirements

  • Proven experience building high-throughput data ingestion pipelines
  • Proficiency with Kafka/MSK and Debezium
  • Strong knowledge of AWS Glue S3
  • Experience with ClickHouse and Snowflake for analytics
  • Python SQL Git CI/CD
  • Infrastructure-as-Code: Terraform or AWS CDK

Stack: AWS Glue S3 Airflow/Dagster Kafka/MSK Debezium MySQL (RDS) ClickHouse Snowflake Power BI Terraform/CDK Git Python SQL

Nice to have:

  • Design and maintain scalable ETL/ELT pipelines using AWS Glue Airflow and S3
  • Experience with dbt
  • Familiarity with MySQL binlog-based CDC
  • Power BI reporting knowledge
  • Experience monitoring and tuning ClickHouse clusters

Benefits

24 working days of paid annual leave
6 days of paid sick leave
Official employment
Medical insurance
Coffee zone with fruit & snacks available in the office
Corporate Lunch provided by the company
Gym and sports classes
Healthy and friendly work atmosphere

High-load in-house platform handling multiple third-party integrations data collection and comprehensive reporting systems. The system processes large volumes of real-time data and requires high availability with performance optimization.TasksDesign and maintain scalable ETL/ELT pipelinesHandle CDC ...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala