Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
Design & Development:
Develop and optimize PySpark applications for batch and real-time data processing.
Build scalable ETL/ELT pipelines using Databricks Spark SQL and Delta Lake.
Integrate data from multiple sources (databases APIs streaming platforms) into cloud-based data lakes/warehouses.
Implement data transformations aggregations and joins efficiently in distributed environments.
Performance Tuning & Optimization:
Optimize Spark jobs for performance (partitioning caching broadcast joins).
Troubleshoot and resolve data skew memory issues and job failures.
Monitor and fine-tune Databricks clusters for cost efficiency.
Cloud & Big Data Technologies:
Work with AWS (EMR Glue S3) or Azure (Synapse Data Lake ADF) for data storage and processing.
Implement data governance security and compliance best practices.
Collaboration & Leadership:
Partner with data scientists analysts and business teams to deliver actionable insights.
Mentor junior developers and enforce coding standards testing and CI/CD practices.
5 years of hands-on PySpark development experience.
Strong expertise in Apache Spark (SQL DataFrames RDDs) and Databricks.
Proficiency in Python SQL and shell scripting.
Experience with cloud platforms (AWS/Azure) and big data tools (Hive Kafka Snowflake).
Knowledge of data modeling partitioning and performance optimization.
Bachelors or Masters degree in Computer Science Engineering or related field.
Financial services or banking industry experience.
Certifications in AWS/Azure Databricks or Spark.
Familiarity with CI/CD (Jenkins GitLab) and infrastructure-as-code (Terraform).
Full Time