We are seeking a skilled and proactive Python / PySpark Developer to join our data engineering or analytics team. The ideal candidate will be responsible for building scalable data pipelines performing large-scale data processing and collaborating with data scientists analysts and business ResponsibilitiesDesign develop and optimize ETL data pipelines using PySpark on big data platforms (e.g. Hadoop Databricks EMR).Write clean efficient and modular code in Python for data processing and integration with large datasets to extract insights transform raw data and ensure data with cross-functional teams to understand business requirements and translate them into technical performance tuning and debugging of PySpark and troubleshoot data workflows and batch jobs in production solutions and maintain code repositories (e.g. Git).Required Skills & QualificationsProficient in Python with experience in building data-centric experience with PySpark and understanding of Spark internals (RDDs DataFrames Spark SQL).Hands-on experience with Hadoop ecosystem Hive or cloud-based big data platforms like AWS EMR Azure Databricks or GCP with workflow orchestration tools like Airflow Oozie or understanding of SQL and relational with version control systems like problem-solving skills and ability to work independently or in a degree in Computer Science Engineering or a related Qualifications:Experience with CI/CD pipelines and DevOps of data warehousing and data to streaming technologies Kafka Spark Streaming).Familiarity with containerization tools like Docker or Kubernetes.
Required Experience:
Senior IC