PySpark Developer | Big Data, Cloud, SQL, Data Pipelines, DataOps

Synechron

Job Location:

Chennai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking a highly experienced PySpark Developer to lead the development and optimization of large-scale data processing workflows. This role involves designing building and maintaining robust data pipelines using PySpark Hadoop and related big data technologies to support enterprise analytics machine learning and Data Science initiatives. The ideal candidate will drive data engineering best practices ensure data quality and performance and collaborate with cross-functional teams to deliver scalable and reliable data solutions aligned with organizational goals.

Software Requirements

Required:

Proficiency in Python PySpark and Spark (version 2.4 or higher) for building scalable data pipelines.
Experience with Hadoop ecosystem components such as Hive MapReduce or HDFS.
Familiarity with SQL and NoSQL databases (e.g. PostgreSQL MongoDB).
Version control skills with Git.
Development orchestration and scheduling tools such as Apache Airflow Jenkins or GitHub Actions.
Experience with containerization tools like Docker.
Data management and transformation using tools such as Pandas and Dask.

Preferred:

Knowledge of cloud data platforms such as AWS or Azure.
Familiarity with DataOps practices automation tools and monitoring dashboards such as Prometheus or Grafana.

Overall Responsibilities

Design develop and optimize large-scale data processing pipelines using PySpark and Hadoop technologies.
Build efficient reliable and scalable data workflows following best practices for performance and data quality.
Implement data transformations feature engineering and validation techniques for data science applications.
Collaborate with data scientists analysts and product teams to gather requirements and deliver impactful data solutions.
Conduct performance tuning and troubleshooting to resolve data pipeline issues and inefficiencies.
Automate deployment testing and operational activities incorporating CI/CD pipelines.
Maintain detailed documentation of data architectures workflows and operational procedures.
Support migration projects and cloud integrations to enhance data scalability and security.

Expected outcomes include high-performance data pipelines capable of managing large volume and velocity with minimal downtime and high data integrity.

Technical Skills (By Category)

Programming Languages:

Essential: Python PySpark (2.4 or higher)
Preferred: Java Scala (for big data processing and integration)

Databases/Data Management:

Essential: SQL (PostgreSQL MySQL) NoSQL (MongoDB similar)
Preferred: Data warehousing solutions (Snowflake Redshift)

Cloud Technologies:

Preferred: AWS (S3 EMR Glue) Azure Data Factory cloud-based data processing

Frameworks & Libraries:

Essential: PySpark Hadoop ecosystem (Hive MapReduce) Pandas
Preferred: Dask TensorFlow or other ML integration tools

Development & Automation Tools:

Essential: Git Jenkins CI/CD pipelines Docker Kubernetes (preferred)
Preferred: Terraform CloudFormation DataOps tools like Airflow

Security & Compliance:

Awareness of data encryption role-based access GDPR HIPAA compliance.

Experience Requirements

Minimum of 7 years of professional experience building large-scale data pipelines in production environments.
Hands-on expertise with PySpark Hadoop Spark and data workflow orchestration tools.
Proven experience with data ingestion transformation and validation processes.
Experience supporting enterprise-scale data initiatives in finance healthcare retail or similar sectors preferred; relevant experience in other industries also acceptable.
Strong troubleshooting performance tuning and optimization skills.

Day-to-Day Activities

Develop test and maintain scalable data pipelines using PySpark and related big data tools.
Optimize existing workflows and implement new features to improve data throughput and reliability.
Collaborate with data scientists analysts and engineers to understand data requirements and deliver effective solutions.
Monitor system performance with dashboards troubleshoot bottlenecks and resolve operational issues.
Automate deployment testing and data validation steps as part of CI/CD pipelines.
Document architecture data flow and operational procedures to support ongoing system maintenance and compliance.
Participate in team meetings code reviews and knowledge sharing sessions.

Roles involve technical leadership problem-solving and proactive communication to ensure data platform excellence.

Qualifications

Bachelors or Masters degree in Computer Science Data Science Engineering or related fields.
5 years supporting enterprise-level big data environments with a focus on PySpark and Hadoop ecosystems.
Certifications such as Cloudera Hortonworks or AWS Data Analytics are advantageous.
Proven ability to lead technical projects troubleshoot complex issues and optimize performance.
Strong verbal and written communication skills with the ability to collaborate with diverse teams.

Professional Competencies

Strong analytical and problem-solving mindset focused on data quality performance and operational stability.
Leadership qualities to guide junior team members and influence best practices.
Effective communication skills for stakeholder engagement and reporting.
Adaptability to evolving big data technologies cloud platforms and business demands.
Ownership and initiative to implement continuous improvements.
Excellent time management and prioritization skills to handle multiple projects efficiently.

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Job SummarySynechron is seeking a highly experienced PySpark Developer to lead the development and optimization of large-scale data processing workflows. This role involves designing building and maintaining robust data pipelines using PySpark Hadoop and related big data technologies to support ente...