Hi
I hope youre doing well. I had a chance to review your profile and wanted to discuss a full-time hire position with our client a major Systems Integrator.
Please review the JD below and let me know if you would be interested in exploring the opportunity.
Job Title: Data Engineering Lead
Location: New York NY - Onsite
Duration: Fulltime
Job Description
Must Have Technical/Functional Skills
AWS Data Engineering Services (EMR/GlueRedshiftAurora S3 Lambda) Spark Python Collibra Snowflake/Databricks Tableau.
Roles & Responsibilities
- Ingest and model data from APIs files/SFTP and relational sources; implement layered architectures (raw/clean/serving) using PySpark/SQL and dbt Python.
- Design and operate pipelines with Prefect (or Airflow) including scheduling retries parameterization SLAs and well documented runbooks.
- Build on cloud data platforms leveraging S3/ADLS/GCS for storage and a Spark platform (e.g. Databricks or equivalent) for compute; manage jobs secrets and access.
- Publish governed data services and manage their lifecycle with Azure API Management (APIM) authentication/authorization policies versioning quotas and monitoring.
- Enforce data quality and governance through data contracts validations/tests lineage observability and proactive alerting.
- Optimize performance and cost via partitioning clustering query tuning job sizing and workload management.
- Uphold security and compliance (e.g. PII handling encryption masking) in line with firm standards.
- Collaborate with stakeholders (analytics AI engineering and business teams) to translate requirements into reliable production ready datasets.
- Enable AI/LLM use cases by packaging datasets and metadata for downstream consumption integrating via Model Context Protocol (MCP) where appropriate.
- Continuously improve platform reliability and developer productivity by automating routine tasks reducing technical debt and maintaining clear documentation.
- 4 15 years of professional data engineering experience.
- Strong Python SQL and Spark (PySpark) skills and/or Kafka.
- Snowflake (Snowpipe Tasks Streams) as a complementary warehouse.
- Databricks (Delta formats workflows cataloging) or equivalent Spark platforms.
- Hands-on experience building ETL/ELT with Prefect (or Airflow) dbt Spark and/or Kafka.
- Experience onboarding datasets to cloud data platforms (storage compute security governance).
- Familiarity with Azure/AWS/GCP data services (e.g. S3/ADLS/GCS; Redshift/BigQuery; Glue/ADF).
- Git-based workflows CI/CD and containerization with Docker (Kubernetes a plus).
Generic Managerial Skills If any
- Strategic Technical Leadership: Defining data architecture evaluating new technologies and setting technical standards for AWS-based pipelines
- Stakeholder Communication: Bridging the gap between technical teams and business stakeholders gathering requirements and reporting progress
- Risk Management: Proactively identifying potential bottlenecks in data workflows security risks or scalability issues
- Operational Excellence: Implementing automation optimizing costs and maintaining high data quality standards.
Thanks & Regards
Sumit Goyal
Sr. Technical Recruiter
Hi I hope youre doing well. I had a chance to review your profile and wanted to discuss a full-time hire position with our client a major Systems Integrator. Please review the JD below and let me know if you would be interested in exploring the opportunity. Job Title: Data Engineering Lead Locatio...
Hi
I hope youre doing well. I had a chance to review your profile and wanted to discuss a full-time hire position with our client a major Systems Integrator.
Please review the JD below and let me know if you would be interested in exploring the opportunity.
Job Title: Data Engineering Lead
Location: New York NY - Onsite
Duration: Fulltime
Job Description
Must Have Technical/Functional Skills
AWS Data Engineering Services (EMR/GlueRedshiftAurora S3 Lambda) Spark Python Collibra Snowflake/Databricks Tableau.
Roles & Responsibilities
- Ingest and model data from APIs files/SFTP and relational sources; implement layered architectures (raw/clean/serving) using PySpark/SQL and dbt Python.
- Design and operate pipelines with Prefect (or Airflow) including scheduling retries parameterization SLAs and well documented runbooks.
- Build on cloud data platforms leveraging S3/ADLS/GCS for storage and a Spark platform (e.g. Databricks or equivalent) for compute; manage jobs secrets and access.
- Publish governed data services and manage their lifecycle with Azure API Management (APIM) authentication/authorization policies versioning quotas and monitoring.
- Enforce data quality and governance through data contracts validations/tests lineage observability and proactive alerting.
- Optimize performance and cost via partitioning clustering query tuning job sizing and workload management.
- Uphold security and compliance (e.g. PII handling encryption masking) in line with firm standards.
- Collaborate with stakeholders (analytics AI engineering and business teams) to translate requirements into reliable production ready datasets.
- Enable AI/LLM use cases by packaging datasets and metadata for downstream consumption integrating via Model Context Protocol (MCP) where appropriate.
- Continuously improve platform reliability and developer productivity by automating routine tasks reducing technical debt and maintaining clear documentation.
- 4 15 years of professional data engineering experience.
- Strong Python SQL and Spark (PySpark) skills and/or Kafka.
- Snowflake (Snowpipe Tasks Streams) as a complementary warehouse.
- Databricks (Delta formats workflows cataloging) or equivalent Spark platforms.
- Hands-on experience building ETL/ELT with Prefect (or Airflow) dbt Spark and/or Kafka.
- Experience onboarding datasets to cloud data platforms (storage compute security governance).
- Familiarity with Azure/AWS/GCP data services (e.g. S3/ADLS/GCS; Redshift/BigQuery; Glue/ADF).
- Git-based workflows CI/CD and containerization with Docker (Kubernetes a plus).
Generic Managerial Skills If any
- Strategic Technical Leadership: Defining data architecture evaluating new technologies and setting technical standards for AWS-based pipelines
- Stakeholder Communication: Bridging the gap between technical teams and business stakeholders gathering requirements and reporting progress
- Risk Management: Proactively identifying potential bottlenecks in data workflows security risks or scalability issues
- Operational Excellence: Implementing automation optimizing costs and maintaining high data quality standards.
Thanks & Regards
Sumit Goyal
Sr. Technical Recruiter
View more
View less