Role : AWS PySpark Data Engineer
Location : Reston VA (Hybrid)
Type: Contract
Job Description:
We are seeking a highly skilled AWS PySpark Data Engineer to join our growing team. In this role you will be responsible for designing developing and optimizing big data pipelines using AWS services and PySpark. You will work closely with data scientists analysts and other engineers to build scalable data architectures and drive business insights.
Key Responsibilities:
- Design and Develop Data Pipelines: Build scalable and efficient data pipelines using PySpark on AWS (Amazon EMR AWS Glue AWS Lambda).
- Data Transformation: Implement data transformations and cleansing using PySpark and AWS Glue.
- Cloud Integration: Leverage AWS services such as S3 Redshift Athena and Lambda to create data workflows.
- Data Modeling: Collaborate with the data architecture team to define data models for structured and unstructured data.
- Performance Tuning: Optimize PySpark code and AWS resource usage for high performance and costefficiency.
- Collaborate with CrossFunctional Teams: Work with data scientists analysts and other engineers to support datadriven projects.
- ETL Development: Create ETL (Extract Transform Load) processes using AWS Glue and PySpark.
- Data Quality Assurance: Ensure data accuracy integrity and reliability across pipelines.
- Monitoring & Logging: Set up monitoring logging and alerting for the data pipeline health and performance.
- Documentation: Maintain clear and comprehensive documentation for data pipelines and architecture.
- Required Skills & Qualifications:
- Experience with AWS: Handson experience with AWS services such as S3 EC2 Lambda Glue Redshift and Athena.
- PySpark Expertise: Solid experience in PySpark for data transformation processing and optimization.
- Big Data Technologies: Knowledge of big data frameworks and processing systems such as Apache Hadoop Spark and Kafka.
- ETL Development: Strong skills in designing and developing ETL pipelines using AWS Glue or other tools.
- Programming Skills: Proficient in Python SQL and familiarity with Java/Scala.
- Data Modeling and Warehousing: Experience with designing data models and building data warehouses (Amazon Redshift etc.).
- Version Control: Familiarity with Git for version control.
- Cloud Security & Best Practices: Knowledge of security best practices data encryption and IAM roles in AWS.
Preferred Qualifications:
- Certification: AWS Certified Data Analytics Specialty or similar.
- Experience with Kubernetes: Knowledge of deploying big data workloads using Kubernetes (EKS).
- Data Visualization: Experience with tools like Tableau Power BI or AWS QuickSight.
- Knowledge of CI/CD: Familiarity with Continuous Integration and Deployment in the data pipeline lifecycle.