Job Title: Python Developer with PySpark Experience (Direct Client Requirement) (needed Locals Only )
Tax Work Location: Pasadena CA (Hybrid 3 Days Onsite Per Week) (Needed Locals Only )
Job Type : (C2C ) (needed Locals Only ) (IN PERSON INTERVIEW / F2F)
Job Summary:
We are looking for a skilled Python Developer with strong PySpark experience to join our data engineering team. You will be responsible for designing developing and optimizing large-scale data processing pipelines using Python and Apache Spark. The ideal candidate has a strong understanding of distributed computing principles data wrangling techniques and is comfortable working in cloud-based environments.
Key Responsibilities:
- Design and develop scalable data pipelines using PySpark and Python.
- Build ETL/ELT workflows to ingest transform and load structured and unstructured data.
- Optimize PySpark jobs for performance and scalability.
- Collaborate with data scientists analysts and product teams to understand data needs.
- Integrate data from various sources including relational databases APIs and cloud storage.
- Implement data quality checks validation and monitoring systems.
- Deploy and manage jobs on big data platforms like Hadoop Databricks or EMR.
- Write clean maintainable and well-documented code following best practices.
- Participate in code reviews and provide constructive feedback.
- Ensure adherence to data security and governance standards.
Required Qualifications:
- Bachelors or Masters degree in Computer Science Engineering or related field.
- 3 years of experience in Python development.
- 2 years of hands-on experience with PySpark and Spark-based data processing.
- Strong understanding of data structures algorithms and distributed systems.
- Proficiency with SQL and experience working with relational databases.
- Experience with data pipeline orchestration tools like Airflow Oozie or Luigi.
- Familiarity with cloud platforms (AWS Azure or GCP) and services like S3 EMR Databricks or Glue.
- Strong debugging performance tuning and optimization skills.
Preferred Qualifications:
- Experience with CI/CD pipelines and containerization tools (Docker Kubernetes).
- Knowledge of data warehousing concepts and tools like Snowflake Redshift or BigQuery.
- Understanding of Delta Lake Hive HDFS or Kafka.
- Experience working in Agile environments using tools like JIRA Confluence or Git.
Please send your resumes to: