JD of Senior AWS PySpark Engineer
Role Summary:
This role is responsible for building and orchestrating cloud-native ETL pipelines using AWS Glue Airflow (MWAA) PySpark and S3 to support the modernization of legacy Informatica workloads. The engineer will help design the cloud execution layer for migrated PySpark-based pipelines.
Key Responsibilities:
- Build serverless ETL pipelines using AWS Glue integrated with S3 and PySpark.
- Design and implement workflow orchestration using Airflow (MWAA).
- Support cloud-native data ingestion transformation and data lake integration.
- Monitor troubleshoot and enhance performance of data pipelines in AWS.
- Collaborate with developers and architects to integrate migrated pipelines into production.
Required Skills:
- Strong hands-on experience with PySpark and distributed data transformation.
- Proficiency in SQL for complex joins filters and data validations.
- Exposure to AWS data stack including S3 and Glue is preferred.
- Strong experience with AWS Glue Airflow (MWAA) and S3.
- Knowledge of PySpark and Python-based data engineering.
- Familiarity with legacy ETL modernization and migration best practices.
- Understanding of infrastructure-as-code and deployment automation is a plus.
- Experience integrating metadata-driven frameworks is preferred.