Senior PySpark Developer
Job Summary
We are looking for a Senior PySpark Developer with strong hands-on experience working on the Cloudera stack. The role involves building and maintaining large-scale data processing and ETL pipelines using PySpark. Experience in finance or banking environments is a plus.
Experience: 5 years
Key Responsibilities
Develop and maintain PySpark-based ETL pipelines on the Cloudera platform
Write efficient PySpark and Spark SQL transformations
Process large volumes of structured and semi-structured data
Optimize Spark jobs for performance and scalability
Create and manage Hive tables and partitions
Handle data validation reconciliation and error handling
Support production deployments and troubleshoot issues
Work closely with data engineers and business teams
Primary Skill (Mandatory)
Strong PySpark experience (5 years)
o DataFrame API
o Spark SQL
o Performance tuning (joins partitions shuffles)
o Batch data processing
Required Skills
Strong Python programming skills
Good SQL knowledge
Experience with large-scale data processing
Git or similar version control tools
Preferred / Nice-to-Have Skills (Not Mandatory)
Kafka for streaming or data ingestion
Starburst for distributed SQL querying
Oracle database integration or data extraction
Workflow tools such as Airflow or Oozie
Experience working with the Cloudera stack (HDFS Hive YARN Spark)
Domain Experience (Plus)
Finance or Banking domain experience
Exposure to transactional risk or regulatory data
Education
Bachelors degree in Computer Science or related field (or equivalent professional experience)