We are seeking a highly experienced and hands-on Lead/Senior Data Engineer to architect develop and optimize data solutions in a cloud-native environment. The ideal candidate will have 7 12 years of strong technical expertise in AWS Glue PySpark and Python along with experience designing robust data pipelines and frameworks for large-scale enterprise systems. Prior exposure to the financial domain or regulated environments is a strong advantage.
Key Responsibilities:
Solution Architecture: Design scalable and secure data pipelines using AWS Glue PySpark and related AWS services (EMR S3 Lambda etc.)
Leadership & Mentorship: Guide junior engineers conduct code reviews and enforce best practices in development and deployment.
ETL Development: Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data.
Framework Building: Develop and evolve data frameworks reusable components and automation tools to improve engineering productivity.
Performance Optimization: Optimize large-scale data workflows for performance cost and reliability.
Data Governance: Implement data quality lineage and governance strategies in compliance with enterprise standards.
Collaboration: Work closely with product analytics compliance and DevOps teams to deliver high-quality solutions aligned with business goals.
CI/CD Automation: Set up and manage continuous integration and deployment pipelines using AWS CodePipeline Jenkins or GitLab.
Documentation & Presentations: Prepare technical documentation and present architectural solutions to stakeholders across levels.
Requirements
Required Qualifications:
7 12 years of experience in data engineering or related fields.
Strong expertise in Python programming with a focus on data processing.
Extensive experience with AWS Glue (both Glue Jobs and Glue Studio/Notebooks).
Deep hands-on experience with PySpark for distributed data processing.
Solid AWS knowledge: EMR S3 Lambda IAM Athena CloudWatch Redshift etc.
Proven experience in architecture and managing complex ETL workflows.
Proficiency with Apache Airflow or similar orchestration tools.
Hands-on experience with CI/CD pipelines and DevOps best practices.
Familiarity with data quality data lineage and metadata management.
Strong experience working in agile/scrum teams.
Excellent communication and stakeholder engagement skills.
Preferred/Good to Have:
Experience in financial services capital markets or compliance systems.
Knowledge of data modeling data lakes and data warehouse architecture.
Familiarity with SQL (Athena/Presto/Redshift Spectrum).
Exposure to ML pipeline integration or event-driven architecture is a plus.
Benefits
Flexible work culture and remote options
Opportunity to lead cutting-edge cloud data engineering projects
Skill-building in large-scale regulated environments.
We are seeking a highly experienced and hands-on Lead/senior data engineer to architect, develop, and optimize data solutions in a cloud-native environment. The ideal candidate will have 7 12 years of strong technical expertise in AWS Glue, PySpark, and Python, along with experience designing robust data pipelines and frameworks for large-scale enterprise systems. Prior exposure to the financial domain or regulated environments is a strong advantage. Key Responsibilities: Solution Architecture: Design scalable and secure data pipelines using AWS Glue, PySpark, and related AWS services (EMR, S3, Lambda, etc.) Leadership & Mentorship: Guide junior engineers, conduct code reviews, and enforce best practices in development and deployment. ETL Development: Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data. Framework Building: Develop and evolve data frameworks, reusable components, and automation tools to improve engineering productivity. Performance Optimization: Optimize large-scale data workflows for performance, cost, and reliability. Data Governance: Implement data quality, lineage, and governance strategies in compliance with enterprise standards. Collaboration: Work closely with product, analytics, compliance, and DevOps teams to deliver high-quality solutions aligned with business goals. CI/CD Automation: Set up and manage continuous integration and deployment pipelines using AWS CodePipeline, Jenkins, or GitLab. Documentation & Presentations: Prepare technical documentation and present architectural solutions to stakeholders across levels. Required Qualifications: 7 12 years of experience in data engineering or related fields. Strong expertise in Python programming with a focus on data processing. Extensive experience with AWS Glue (both Glue Jobs and Glue Studio/Notebooks). Deep hands-on experience with PySpark for distributed data processing. Solid AWS knowledge: EMR, S3, Lambda, IAM, Athena, CloudWatch, Redshift, etc. Proven experience in architecting and managing complex ETL workflows. Proficiency with Apache Airflow or similar orchestration tools. Hands-on experience with CI/CD pipelines and DevOps best practices. Familiarity with data quality, data lineage, and metadata management. Strong experience working in agile/scrum teams. Excellent communication and stakeholder engagement skills. Preferred/Good to Have: Experience in financial services, capital markets, or compliance systems. Knowledge of data modeling, data lakes, and data warehouse architecture. Familiarity with SQL (Athena/Presto/Redshift Spectrum). Exposure to ML pipeline integration or event-driven architecture is a plus. What We Offer: Flexible work culture and remote options Opportunity to lead cutting-edge cloud data engineering projects Skill-building in large-scale, regulated environments.