Senior Data Engineer – Python & PySpark

Purple Drive

Not Interested
Bookmark
Report This Job

profile Job Location:

Jersey, NJ - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Senior Data Engineer - Python & PySpark


Job Summary

We are seeking an experienced Senior Data Engineer with strong expertise in Python PySpark SQL and Big Data technologies.

The ideal candidate will be responsible for designing developing and optimizing scalable data pipelines and ETL/ELT workflows for processing large volumes of structured and unstructured data. The role requires hands-on experience with distributed data processing cloud platforms orchestration tools and performance optimization of big data applications.


Key Responsibilities

Data Pipeline Development

  • Design develop and maintain scalable data pipelines using:
    • Python
    • Apache Spark / PySpark
  • Build reusable and efficient data processing frameworks.

ETL / ELT Development

  • Develop and optimize ETL/ELT workflows for:
    • Data ingestion
    • Data transformation
    • Data processing
  • Process large volumes of structured and unstructured data.

Big Data Processing

  • Work with big data technologies such as:
    • Hadoop ecosystem
    • Hive
    • Spark
  • Implement distributed computing solutions for high-performance processing.

Data Modeling & Warehousing

  • Support:
    • Data modeling
    • Data architecture
    • Data warehousing solutions
  • Ensure scalability and maintainability of data systems.

SQL & Database Management

  • Write and optimize:
    • Complex SQL queries
    • Data transformation logic
  • Work with:
    • Relational databases
    • Non-relational databases

Cloud & Orchestration

  • Deploy and manage data solutions on cloud platforms such as:
    • AWS
    • Azure
    • GCP
  • Work with orchestration tools like:
    • Apache Airflow

Data Quality & Governance

  • Perform:
    • Data validation
    • Data cleansing
    • Data transformation
  • Ensure compliance with:
    • Data governance
    • Security standards

Performance Optimization

  • Optimize:
    • Spark jobs
    • SQL queries
    • Data pipelines
  • Improve:
    • Scalability
    • Reliability
    • Processing performance

Collaboration & Agile Delivery

  • Collaborate with:
    • Data Analysts
    • Data Scientists
    • DevOps teams
    • Business stakeholders
  • Participate in:
    • Agile ceremonies
    • Sprint planning
    • Continuous improvement initiatives

Required Skills

Programming & Data Engineering

  • Python
  • PySpark
  • Apache Spark
  • SQL

Big Data Technologies

  • Hadoop ecosystem
  • Hive
  • Distributed computing platforms

ETL / ELT & Orchestration

  • ETL / ELT pipelines
  • Apache Airflow or similar orchestration tools

Cloud Platforms

  • AWS / Azure / GCP
  • Cloud-based data services

Databases & Data Warehousing

  • Relational databases
  • NoSQL databases
  • Data warehousing concepts
  • Data modeling

File Formats

  • Parquet
  • Avro
  • JSON
  • CSV

Soft Skills

  • Strong analytical and troubleshooting skills
  • Excellent communication and collaboration abilities
  • Ability to work with cross-functional teams

Experience Required

  • 6-10 years of experience in:
    • Data Engineering
    • Big Data technologies
    • Distributed data processing

Preferred Skills

  • Performance tuning and optimization expertise
  • Experience with scalable cloud-native data architectures
  • Exposure to DevOps and CI/CD for data platforms
Senior Data Engineer - Python & PySpark Job Summary We are seeking an experienced Senior Data Engineer with strong expertise in Python PySpark SQL and Big Data technologies. The ideal candidate will be responsible for designing developing and optimizing scalable data pipelines and ETL/ELT workflows...
View more view more