Experience:8 years of experience in data engineering specifically in cloud environments like in PySpark for distributed data processing and experience with AWS Glue for ETL jobs and managing data experience with AWS Data Pipeline (DPL) for workflow experience with AWS services such as S3 Lambda Redshift RDS and Skills:Proficiency in Python and PySpark for data processing and transformation understanding of ETL concepts and best with AWS Glue (ETL jobs Data Catalog and Crawlers).Experience building and maintaining data pipelines with AWS Data Pipeline or similar orchestration with AWS S3 for data storage and management including file formats (CSV Parquet Avro).Strong knowledge of SQL for querying and manipulating relational and semistructured with Data Warehousing and Big Data technologies specifically within Skills:Experience with AWS Lambda for serverless data processing and of AWS Redshift for data warehousing and with Data Lakes Amazon EMR and Kinesis for streaming data of data governance practices including data lineage and with CI/CD pipelines and Git for version with Docker and containerization for building and deploying and Build Data Pipelines: Design implement and optimize data pipelines on AWS using PySpark AWS Glue and AWS Data Pipeline to automate data integration transformation and storage Development: Develop and maintain Extract Transform and Load (ETL) processes using AWS Glue and PySpark to efficiently process large Workflow Automation: Build and manage automated data workflows using AWS Data Pipeline ensuring seamless scheduling monitoring and management of data Integration: Work with different AWS data storage services (e.g. S3 Redshift RDS) to ensure smooth integration and movement of data across and Scaling: Optimize and scale data pipelines for high performance and cost efficiency utilizing AWS services like Lambda S3 and EC2.