Senior Pyspark Developer

Kavi Software Technologies Private Limited

Job Location:

Chennai - India

Monthly Salary: Not Disclosed

Experience Required: 5years

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Senior PySpark Developer

Job Summary

We are looking for a Senior PySpark Developer with strong hands-on experience working on the Cloudera stack. The role involves building and maintaining large-scale data processing and ETL pipelines using PySpark. Experience in finance or banking environments is a plus.

Experience: 5 years

Key Responsibilities

Develop and maintain PySpark-based ETL pipelines on the Cloudera platform

Write efficient PySpark and Spark SQL transformations

Process large volumes of structured and semi-structured data

Optimize Spark jobs for performance and scalability

Create and manage Hive tables and partitions

Handle data validation reconciliation and error handling

Support production deployments and troubleshoot issues

Work closely with data engineers and business teams

Primary Skill (Mandatory)

Strong PySpark experience (5 years)

o DataFrame API

o Spark SQL

o Performance tuning (joins partitions shuffles)

o Batch data processing

Required Skills

Strong Python programming skills

Good SQL knowledge

Experience with large-scale data processing

Git or similar version control tools

Preferred / Nice-to-Have Skills (Not Mandatory)

Kafka for streaming or data ingestion

Starburst for distributed SQL querying

Oracle database integration or data extraction

Workflow tools such as Airflow or Oozie

Experience working with the Cloudera stack (HDFS Hive YARN Spark)

Domain Experience (Plus)

Finance or Banking domain experience

Exposure to transactional risk or regulatory data

Education

Bachelors degree in Computer Science or related field (or equivalent professional experience)

Senior PySpark DeveloperJob Summary We are looking for a Senior PySpark Developer with strong hands-on experience working on the Cloudera stack. The role involves building and maintaining large-scale data processing and ETL pipelines using PySpark. Experience in finance or banking environments is a ...