Tech Lead Databricks Engineering

Dublin - Ireland

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Description

Company Overview

Citco is a global leader in financial services delivering innovative solutions to some of the worlds largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead Data Engineering with extensive Databricks expertise and AWS experience to lead mission-critical data initiatives

Role Summary

As the Tech Lead Data Engineering you will be responsible for architecting implementing and optimizing end-to-end data solutions on Databricks (Spark Delta Lake MLflow etc.) while integrating with core AWS services (S3 Glue Lambda etc.). You will lead a technical team of data engineers ensuring best practices in performance security and scalability. This role requires a deep hands-on understanding of Databricks internals and a track record of delivering large-scale data platforms in a cloud environment.

Responsibilities

Key Responsibilities

Databricks Platform & Architecture
- Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.
- Leverage Databricks SQL Analytics for interactive querying and report generation.
- Manage cluster lifecycle (provisioning sizing scaling) and optimize Spark jobs for cost and performance.
- Implement structured streaming pipelines for near real-time data ingestion and processing.
- Configure and administer Databricks Repos notebooks and job scheduling/orchestration to streamline development workflows.
AWS Cloud Integration
- Integrate Databricks with AWS S3 as the primary data lake storage layer.
- Design and implement ETL/ELT pipelines using AWS Glue catalog AWS Lambda and AWS Step Functions where needed.
- Ensure proper networking configuration (VPC security groups private links) for secure and compliant data access.
- Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
Data Pipeline & Workflow Management
- Develop and maintain scalable reusable ETL frameworks using Spark (Python/Scala).
- Orchestrate complex workflows applying CI/CD principles (Git-based version control automated testing).
- Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.
- Integrate with MLflow (if applicable) for experiment tracking and model versioning ensuring data lineage and reproducibility.
Performance Tuning & Optimization
- Conduct advanced Spark job tuning (caching strategies shuffle partitions broadcast joins memory optimization).
- Fine-tune Databricks clusters (autoscaling policies instance types) to manage cost without compromising performance.
- Optimize I/O performance and concurrency for large-scale data sets.
Security & Governance
- Implement Unity Catalog or equivalent Databricks features for centralized governance access control and data lineage.
- Ensure compliance with industry standards (e.g. GDPR SOC ISO) and internal security policies.
- Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
Technical Leadership & Mentorship
- Lead and mentor a team of data engineers conducting code reviews design reviews and knowledge-sharing sessions.
- Champion Agile or Scrum development practices coordinating sprints and deliverables.
- Serve as a primary technical liaison working closely with product managers data scientists DevOps and external stakeholders.
Monitoring & Reliability
- Configure observability solutions (e.g. Datadog CloudWatch Prometheus) to proactively identify performance bottlenecks.
- Set up alerting mechanisms for latency cost overruns and cluster health.
- Maintain SLAs and KPIs for data pipelines ensuring robust data quality and reliability.
Innovation & Continuous Improvement
- Stay updated on Databricks roadmap and emerging data engineering trends (e.g. Photon Lakehouse features).
- Evaluate new tools and technologies driving POCs to improve data platform capabilities.
- Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.

Qualifications

Educational Background
- Bachelors or Masters degree in Computer Science Data Science Engineering or equivalent experience.
Technical Experience
- Databricks Expertise: 5 years of hands-on Databricks (Spark) experience with a focus on building and maintaining production-grade pipelines.
- AWS Services: Proven track record with AWS S3 EC2 Glue EMR Lambda Step Functions and security best practices (IAM VPC).
- Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
- Data Warehousing & Modeling: Familiarity with RDBMS (e.g. Postgres Redshift) and dimensional modeling techniques.
- Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
- Version Control & CI/CD: Git-based workflows (GitHub/GitLab) Jenkins or similar CI/CD tools for automated builds and deployments.
Leadership & Soft Skills
- Demonstrated experience leading a team of data engineers in a complex high-traffic data environment.
- Outstanding communication and stakeholder management skills with the ability to translate technical jargon into business insights.
- Adept at problem-solving with a track record of quickly diagnosing and resolving data performance issues.
Certifications (Preferred)
- Databricks Certified Associate/Professional (e.g. Databricks Certified Professional Data Engineer).
- AWS Solutions Architect (Associate or Professional).

Required Experience:

Staff IC

Description Company OverviewCitco is a global leader in financial services delivering innovative solutions to some of the worlds largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead Data Engineer...

Description

Company Overview

Role Summary

Responsibilities

Key Responsibilities

Databricks Platform & Architecture
- Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.
- Leverage Databricks SQL Analytics for interactive querying and report generation.
- Manage cluster lifecycle (provisioning sizing scaling) and optimize Spark jobs for cost and performance.
- Implement structured streaming pipelines for near real-time data ingestion and processing.
- Configure and administer Databricks Repos notebooks and job scheduling/orchestration to streamline development workflows.
AWS Cloud Integration
- Integrate Databricks with AWS S3 as the primary data lake storage layer.
- Design and implement ETL/ELT pipelines using AWS Glue catalog AWS Lambda and AWS Step Functions where needed.
- Ensure proper networking configuration (VPC security groups private links) for secure and compliant data access.
- Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
Data Pipeline & Workflow Management
- Develop and maintain scalable reusable ETL frameworks using Spark (Python/Scala).
- Orchestrate complex workflows applying CI/CD principles (Git-based version control automated testing).
- Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.
- Integrate with MLflow (if applicable) for experiment tracking and model versioning ensuring data lineage and reproducibility.
Performance Tuning & Optimization
- Conduct advanced Spark job tuning (caching strategies shuffle partitions broadcast joins memory optimization).
- Fine-tune Databricks clusters (autoscaling policies instance types) to manage cost without compromising performance.
- Optimize I/O performance and concurrency for large-scale data sets.
Security & Governance
- Implement Unity Catalog or equivalent Databricks features for centralized governance access control and data lineage.
- Ensure compliance with industry standards (e.g. GDPR SOC ISO) and internal security policies.
- Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
Technical Leadership & Mentorship
- Lead and mentor a team of data engineers conducting code reviews design reviews and knowledge-sharing sessions.
- Champion Agile or Scrum development practices coordinating sprints and deliverables.
- Serve as a primary technical liaison working closely with product managers data scientists DevOps and external stakeholders.
Monitoring & Reliability
- Configure observability solutions (e.g. Datadog CloudWatch Prometheus) to proactively identify performance bottlenecks.
- Set up alerting mechanisms for latency cost overruns and cluster health.
- Maintain SLAs and KPIs for data pipelines ensuring robust data quality and reliability.
Innovation & Continuous Improvement
- Stay updated on Databricks roadmap and emerging data engineering trends (e.g. Photon Lakehouse features).
- Evaluate new tools and technologies driving POCs to improve data platform capabilities.
- Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.

Qualifications

Educational Background
- Bachelors or Masters degree in Computer Science Data Science Engineering or equivalent experience.
Technical Experience
- Databricks Expertise: 5 years of hands-on Databricks (Spark) experience with a focus on building and maintaining production-grade pipelines.
- AWS Services: Proven track record with AWS S3 EC2 Glue EMR Lambda Step Functions and security best practices (IAM VPC).
- Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
- Data Warehousing & Modeling: Familiarity with RDBMS (e.g. Postgres Redshift) and dimensional modeling techniques.
- Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
- Version Control & CI/CD: Git-based workflows (GitHub/GitLab) Jenkins or similar CI/CD tools for automated builds and deployments.
Leadership & Soft Skills
- Demonstrated experience leading a team of data engineers in a complex high-traffic data environment.
- Outstanding communication and stakeholder management skills with the ability to translate technical jargon into business insights.
- Adept at problem-solving with a track record of quickly diagnosing and resolving data performance issues.
Certifications (Preferred)
- Databricks Certified Associate/Professional (e.g. Databricks Certified Professional Data Engineer).
- AWS Solutions Architect (Associate or Professional).

Required Experience:

Staff IC

Key Skills

Administrative Skills
Facilities Management
Biotechnology
Creative Production
Design And Estimation
Architecture

Apply Now

About Company

The Citco Group

At Citco, we don't just provide bespoke solutions and better results. We’re a true partner dedicated to developing rich, long-term relationships through gold standard services.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click