Senior Data Engineer – Python & PySpark

Purple Drive

Job Location:

Jersey, NJ - USA

Monthly Salary: Not Disclosed

Posted on: 4 hours ago

Vacancies: 1 Vacancy

Job Summary

Senior Data Engineer - Python & PySpark

Job Summary

We are seeking an experienced Senior Data Engineer with strong expertise in Python PySpark SQL and Big Data technologies.

The ideal candidate will be responsible for designing developing and optimizing scalable data pipelines and ETL/ELT workflows for processing large volumes of structured and unstructured data. The role requires hands-on experience with distributed data processing cloud platforms orchestration tools and performance optimization of big data applications.

Key Responsibilities

Data Pipeline Development

Design develop and maintain scalable data pipelines using:
- Python
- Apache Spark / PySpark
Build reusable and efficient data processing frameworks.

ETL / ELT Development

Develop and optimize ETL/ELT workflows for:
- Data ingestion
- Data transformation
- Data processing
Process large volumes of structured and unstructured data.

Big Data Processing

Work with big data technologies such as:
- Hadoop ecosystem
- Hive
- Spark
Implement distributed computing solutions for high-performance processing.

Data Modeling & Warehousing

Support:
- Data modeling
- Data architecture
- Data warehousing solutions
Ensure scalability and maintainability of data systems.

SQL & Database Management

Write and optimize:
- Complex SQL queries
- Data transformation logic
Work with:
- Relational databases
- Non-relational databases

Cloud & Orchestration

Deploy and manage data solutions on cloud platforms such as:
- AWS
- Azure
- GCP
Work with orchestration tools like:
- Apache Airflow

Data Quality & Governance

Perform:
- Data validation
- Data cleansing
- Data transformation
Ensure compliance with:
- Data governance
- Security standards

Performance Optimization

Optimize:
- Spark jobs
- SQL queries
- Data pipelines
Improve:
- Scalability
- Reliability
- Processing performance

Collaboration & Agile Delivery

Collaborate with:
- Data Analysts
- Data Scientists
- DevOps teams
- Business stakeholders
Participate in:
- Agile ceremonies
- Sprint planning
- Continuous improvement initiatives

Required Skills

Programming & Data Engineering

Python
PySpark
Apache Spark
SQL

Big Data Technologies

Hadoop ecosystem
Hive
Distributed computing platforms

ETL / ELT & Orchestration

ETL / ELT pipelines
Apache Airflow or similar orchestration tools

Cloud Platforms

AWS / Azure / GCP
Cloud-based data services

Databases & Data Warehousing

Relational databases
NoSQL databases
Data warehousing concepts
Data modeling

File Formats

Parquet
Avro
JSON
CSV

Soft Skills

Strong analytical and troubleshooting skills
Excellent communication and collaboration abilities
Ability to work with cross-functional teams

Experience Required

6-10 years of experience in:
- Data Engineering
- Big Data technologies
- Distributed data processing

Preferred Skills

Performance tuning and optimization expertise
Experience with scalable cloud-native data architectures
Exposure to DevOps and CI/CD for data platforms

Senior Data Engineer - Python & PySpark Job Summary We are seeking an experienced Senior Data Engineer with strong expertise in Python PySpark SQL and Big Data technologies. The ideal candidate will be responsible for designing developing and optimizing scalable data pipelines and ETL/ELT workflows...

Senior Data Engineer - Python & PySpark

Job Summary

We are seeking an experienced Senior Data Engineer with strong expertise in Python PySpark SQL and Big Data technologies.

Key Responsibilities

Data Pipeline Development

Design develop and maintain scalable data pipelines using:
- Python
- Apache Spark / PySpark
Build reusable and efficient data processing frameworks.

ETL / ELT Development

Develop and optimize ETL/ELT workflows for:
- Data ingestion
- Data transformation
- Data processing
Process large volumes of structured and unstructured data.

Big Data Processing

Work with big data technologies such as:
- Hadoop ecosystem
- Hive
- Spark
Implement distributed computing solutions for high-performance processing.

Data Modeling & Warehousing

Support:
- Data modeling
- Data architecture
- Data warehousing solutions
Ensure scalability and maintainability of data systems.

SQL & Database Management

Write and optimize:
- Complex SQL queries
- Data transformation logic
Work with:
- Relational databases
- Non-relational databases

Cloud & Orchestration

Deploy and manage data solutions on cloud platforms such as:
- AWS
- Azure
- GCP
Work with orchestration tools like:
- Apache Airflow

Data Quality & Governance

Perform:
- Data validation
- Data cleansing
- Data transformation
Ensure compliance with:
- Data governance
- Security standards

Performance Optimization

Optimize:
- Spark jobs
- SQL queries
- Data pipelines
Improve:
- Scalability
- Reliability
- Processing performance

Collaboration & Agile Delivery

Collaborate with:
- Data Analysts
- Data Scientists
- DevOps teams
- Business stakeholders
Participate in:
- Agile ceremonies
- Sprint planning
- Continuous improvement initiatives

Required Skills

Programming & Data Engineering

Python
PySpark
Apache Spark
SQL

Big Data Technologies

Hadoop ecosystem
Hive
Distributed computing platforms

ETL / ELT & Orchestration

ETL / ELT pipelines
Apache Airflow or similar orchestration tools

Cloud Platforms

AWS / Azure / GCP
Cloud-based data services

Databases & Data Warehousing

Relational databases
NoSQL databases
Data warehousing concepts
Data modeling

File Formats

Parquet
Avro
JSON
CSV

Soft Skills

Strong analytical and troubleshooting skills
Excellent communication and collaboration abilities
Ability to work with cross-functional teams

Experience Required

6-10 years of experience in:
- Data Engineering
- Big Data technologies
- Distributed data processing

Preferred Skills

Performance tuning and optimization expertise
Experience with scalable cloud-native data architectures
Exposure to DevOps and CI/CD for data platforms

Apply Now

About Company

Purple Drive

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Data Engineer – Python & PySpark

Jersey, NJ - USA

Job Summary

Job Summary

Key Responsibilities

Data Pipeline Development

ETL / ELT Development

Big Data Processing

Data Modeling & Warehousing

SQL & Database Management

Cloud & Orchestration

Data Quality & Governance

Performance Optimization

Collaboration & Agile Delivery

Required Skills

Programming & Data Engineering

Big Data Technologies

ETL / ELT & Orchestration

Cloud Platforms

Databases & Data Warehousing

File Formats

Soft Skills

Experience Required

Preferred Skills

Job Summary

Key Responsibilities

Data Pipeline Development

ETL / ELT Development

Big Data Processing

Data Modeling & Warehousing

SQL & Database Management

Cloud & Orchestration

Data Quality & Governance

Performance Optimization

Collaboration & Agile Delivery

Required Skills

Programming & Data Engineering

Big Data Technologies

ETL / ELT & Orchestration

Cloud Platforms

Databases & Data Warehousing

File Formats

Soft Skills

Experience Required

Preferred Skills

About Company

Related Jobs