Role: Pyspark Developer
Responsibilities:
Proficient in Python programming language and PySpark concepts.
Design develop and maintain PySpark applications for data processing analysis and transformation.
Strong understanding of Apache Spark architecture RDDs DataFrames and Spark SQL.
Solid knowledge of data transformation and manipulation using PySpark libraries functions and SQL expressions.
Familiarity with distributed computing concepts and big data processing frameworks.
Solid understanding of databases & SQL at oraclepostgre and Snowflake.
Strong analytical and problemsolving skills with the ability to troubleshoot and resolve data integration issues.
Ensure data quality and integrity by implementing data validation and error handling mechanisms in PySpark applications.
Excellent communication and interpersonal skills to collaborate effectively with crossfunctional teams.
Design and implement data storage solutions using Amazon S3 buckets considering scalability security and costefficiency.
Manage and organize data within S3 buckets defining folder structures access control policies and data lifecycle management strategies.
Define and implement data schemas and structures using Avro and Parquet formats.
spark,data processing,sql,data,design,s3,skills,structures,processing,concepts