ONLY C2C
Maximum Pay Rate
USD 65.00
JOB#121927
Data Engineer
Vanguard Group
Worksite Address (Hybrid) 3days onsite)
Job Description
Overview
We are seeking a highly experienced Data Engineer with 5 years of experience. This role is critical to hitting product rollout deadlines as the teams work is a hard direct dependency for other product feature rollouts. The ideal candidate will be a hands-on developer with deep expertise in the AWS data stack focusing primarily on data engineering and pipeline development.
Key Responsibilities
Develop and Implement Data Pipelines: Design build and maintain robust data pipelines primarily using AWS Glue and PySpark.
Data Sourcing and Transformation: Source data from various systems including Redshift and Aurora performing necessary streaming transformations and heavy data cleaning.
Data Delivery: Push resulting cleaned datasets into S3 buckets.
External Integration: Manage the secure transfer of resulting files via SFTP to an external 3rd party companys server adhering to non-negotiable external integration deadlines.
Collaboration: Work closely with the team to consult on the best and most efficient solutions for achieving required data outputs given the constraints of the AWS Glue/PySpark environment.
Required Qualifications and Skills
AWS Data Stack: Heavy expertise in the AWS ecosystem specifically AWS Glue.
PySpark Expertise: Hands-on experience working with PySpark on complex application implementations is required.
Database Knowledge: Heavy knowledge of both relational (e.g. Redshift Aurora) and non-SQL databases and how to leverage them within the AWS Glue/PySpark environment.
Experience Level: Looking for experienced engineers with .
Data Engineering Fundamentals: Strong general knowledge of how to efficiently get transform and push out data.
Job Responsibilities
Key Responsibilities
Develop and Implement Data Pipelines: Design build and maintain robust data pipelines primarily using AWS Glue and PySpark.
Data Sourcing and Transformation: Source data from various systems including Redshift and Aurora performing necessary streaming transformations and heavy data cleaning.
Data Delivery: Push resulting cleaned datasets into S3 buckets.
External Integration: Manage the secure transfer of resulting files via SFTP to an external 3rd party companys server adhering to non-negotiable external integration deadlines.
Collaboration: Work closely with the team to consult on the best and most efficient solutions for achieving required data outputs given the constraints of the AWS Glue/PySpark environment.
Industry
Banking Financial Services & InsuranceEstimated Start Date
2/3/2026
Estimated End Date
8/7/2026