High-load in-house platform handling multiple third-party integrations data collection and comprehensive reporting systems. The system processes large volumes of real-time data and requires high availability with performance optimization.
Tasks
- Design and maintain scalable ETL/ELT pipelines
- Handle CDC ingestion from Amazon RDS (MySQL) via Debezium Kafka
- Load and optimize high-throughput data flows into ClickHouse and Snowflake
- Tune ingestion in ClickHouse: partitioning TTLs ORDER BY keys
- Use Parquet/ORC and partitioning for efficient storage in S3
- Integrate datasets into Power BI
- Implement monitoring and alerting for data pipelines
- Use Git CI/CD and manage infrastructure with Terraform or AWS CDK
Requirements
- Proven experience building high-throughput data ingestion pipelines
- Proficiency with Kafka/MSK and Debezium
- Strong knowledge of AWS Glue S3
- Experience with ClickHouse and Snowflake for analytics
- Python SQL Git CI/CD
- Infrastructure-as-Code: Terraform or AWS CDK
Stack: AWS Glue S3 Airflow/Dagster Kafka/MSK Debezium MySQL (RDS) ClickHouse Snowflake Power BI Terraform/CDK Git Python SQL
Nice to have:
- Design and maintain scalable ETL/ELT pipelines using AWS Glue Airflow and S3
- Experience with dbt
- Familiarity with MySQL binlog-based CDC
- Power BI reporting knowledge
- Experience monitoring and tuning ClickHouse clusters
Benefits
24 working days of paid annual leave
6 days of paid sick leave
Official employment
Medical insurance
Coffee zone with fruit & snacks available in the office
Corporate Lunch provided by the company
Gym and sports classes
Healthy and friendly work atmosphere
High-load in-house platform handling multiple third-party integrations data collection and comprehensive reporting systems. The system processes large volumes of real-time data and requires high availability with performance optimization.TasksDesign and maintain scalable ETL/ELT pipelinesHandle CDC ...
High-load in-house platform handling multiple third-party integrations data collection and comprehensive reporting systems. The system processes large volumes of real-time data and requires high availability with performance optimization.
Tasks
- Design and maintain scalable ETL/ELT pipelines
- Handle CDC ingestion from Amazon RDS (MySQL) via Debezium Kafka
- Load and optimize high-throughput data flows into ClickHouse and Snowflake
- Tune ingestion in ClickHouse: partitioning TTLs ORDER BY keys
- Use Parquet/ORC and partitioning for efficient storage in S3
- Integrate datasets into Power BI
- Implement monitoring and alerting for data pipelines
- Use Git CI/CD and manage infrastructure with Terraform or AWS CDK
Requirements
- Proven experience building high-throughput data ingestion pipelines
- Proficiency with Kafka/MSK and Debezium
- Strong knowledge of AWS Glue S3
- Experience with ClickHouse and Snowflake for analytics
- Python SQL Git CI/CD
- Infrastructure-as-Code: Terraform or AWS CDK
Stack: AWS Glue S3 Airflow/Dagster Kafka/MSK Debezium MySQL (RDS) ClickHouse Snowflake Power BI Terraform/CDK Git Python SQL
Nice to have:
- Design and maintain scalable ETL/ELT pipelines using AWS Glue Airflow and S3
- Experience with dbt
- Familiarity with MySQL binlog-based CDC
- Power BI reporting knowledge
- Experience monitoring and tuning ClickHouse clusters
Benefits
24 working days of paid annual leave
6 days of paid sick leave
Official employment
Medical insurance
Coffee zone with fruit & snacks available in the office
Corporate Lunch provided by the company
Gym and sports classes
Healthy and friendly work atmosphere
View more
View less