Description Company Overview
Citco is a global leader in financial services delivering innovative solutions to some of the worlds largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead Data Engineering with extensive Databricks expertise and AWS experience to lead mission-critical data initiatives
Role Summary
As the Tech Lead Data Engineering you will be responsible for architecting implementing and optimizing end-to-end data solutions on Databricks (Spark Delta Lake MLflow etc.) while integrating with core AWS services (S3 Glue Lambda etc.). You will lead a technical team of data engineers ensuring best practices in performance security and scalability. This role requires a deep hands-on understanding of Databricks internals and a track record of delivering large-scale data platforms in a cloud environment.
ResponsibilitiesKey Responsibilities
- Databricks Platform & Architecture
- Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.
- Leverage Databricks SQL Analytics for interactive querying and report generation.
- Manage cluster lifecycle (provisioning sizing scaling) and optimize Spark jobs for cost and performance.
- Implement structured streaming pipelines for near real-time data ingestion and processing.
- Configure and administer Databricks Repos notebooks and job scheduling/orchestration to streamline development workflows.
- AWS Cloud Integration
- Integrate Databricks with AWS S3 as the primary data lake storage layer.
- Design and implement ETL/ELT pipelines using AWS Glue catalog AWS Lambda and AWS Step Functions where needed.
- Ensure proper networking configuration (VPC security groups private links) for secure and compliant data access.
- Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
- Data Pipeline & Workflow Management
- Develop and maintain scalable reusable ETL frameworks using Spark (Python/Scala).
- Orchestrate complex workflows applying CI/CD principles (Git-based version control automated testing).
- Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.
- Integrate with MLflow (if applicable) for experiment tracking and model versioning ensuring data lineage and reproducibility.
- Performance Tuning & Optimization
- Conduct advanced Spark job tuning (caching strategies shuffle partitions broadcast joins memory optimization).
- Fine-tune Databricks clusters (autoscaling policies instance types) to manage cost without compromising performance.
- Optimize I/O performance and concurrency for large-scale data sets.
- Security & Governance
- Implement Unity Catalog or equivalent Databricks features for centralized governance access control and data lineage.
- Ensure compliance with industry standards (e.g. GDPR SOC ISO) and internal security policies.
- Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
- Technical Leadership & Mentorship
- Lead and mentor a team of data engineers conducting code reviews design reviews and knowledge-sharing sessions.
- Champion Agile or Scrum development practices coordinating sprints and deliverables.
- Serve as a primary technical liaison working closely with product managers data scientists DevOps and external stakeholders.
- Monitoring & Reliability
- Configure observability solutions (e.g. Datadog CloudWatch Prometheus) to proactively identify performance bottlenecks.
- Set up alerting mechanisms for latency cost overruns and cluster health.
- Maintain SLAs and KPIs for data pipelines ensuring robust data quality and reliability.
- Innovation & Continuous Improvement
- Stay updated on Databricks roadmap and emerging data engineering trends (e.g. Photon Lakehouse features).
- Evaluate new tools and technologies driving POCs to improve data platform capabilities.
- Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.
Qualifications Qualifications
- Educational Background
- Bachelors or Masters degree in Computer Science Data Science Engineering or equivalent experience.
- Technical Experience
- Databricks Expertise: 5 years of hands-on Databricks (Spark) experience with a focus on building and maintaining production-grade pipelines.
- AWS Services: Proven track record with AWS S3 EC2 Glue EMR Lambda Step Functions and security best practices (IAM VPC).
- Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
- Data Warehousing & Modeling: Familiarity with RDBMS (e.g. Postgres Redshift) and dimensional modeling techniques.
- Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
- Version Control & CI/CD: Git-based workflows (GitHub/GitLab) Jenkins or similar CI/CD tools for automated builds and deployments.
- Leadership & Soft Skills
- Demonstrated experience leading a team of data engineers in a complex high-traffic data environment.
- Outstanding communication and stakeholder management skills with the ability to translate technical jargon into business insights.
- Adept at problem-solving with a track record of quickly diagnosing and resolving data performance issues.
- Certifications (Preferred)
- Databricks Certified Associate/Professional (e.g. Databricks Certified Professional Data Engineer).
- AWS Solutions Architect (Associate or Professional).
Required Experience:
Staff IC