PySpark Data Engineer | Big Data & Analytics

Synechron


Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed
Posted on: 8 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking an experienced PySpark Data Engineer / Data Scientist to lead data pipeline development and advanced analytics initiatives within our financial data and index analytics division. This role plays a crucial part in building scalable data processing solutions enabling data-driven insights and supporting machine learning workflows in both batch and streaming environments. The ideal candidate will possess a strong technical foundation in big data processing analytics and software engineering along with leadership capabilities to drive impactful data projects.

Software Requirements

Required Skills:

  • Proven expertise in Python programming emphasizing clean maintainable and scalable code

  • Hands-on experience with PySpark in both batch and streaming workflows

  • Deep knowledge of data manipulation and feature engineering including Pandas NumPy and visualization libraries (matplotlib seaborn)

  • Experience with Spark components like Spark SQL DataFrames and Spark MLlib

  • Familiarity with data storage solutions: SQL and NoSQL databases (e.g. Hive Cassandra)

  • Knowledge of ETL tools such as Apache Airflow Jenkins or GithHub Actions for scheduling and automation

  • Experience working with cloud environments especially Azure or AWS for big data processing

Preferred Skills:

  • Hands-on with containerization and orchestration (Docker Kubernetes)

  • Exposure to distributed storage solutions like Hadoop HDFS or Azure Data Lake

Overall Responsibilities

  • 5 years of experience in Design develop and optimize large-scale data pipelines using PySpark for structured semi-structured and unstructured data

  • 5 years of experience to Lead the building of ML pipelines for training validation and deployment of models in streaming/batch modes

  • Write high-quality efficient code that supports data transformation cleaning and feature engineering

  • Collaborate with data scientists analysts and stakeholders to understand data requirements and deliver actionable insights

  • Build and maintain reusable code base and automation scripts for data processing and model validation

  • Monitor pipeline performance troubleshoot issues and implement improvements to ensure robustness and scalability

  • Stay up-to-date with the latest in big data processing ML techniques and analytics tools to improve system efficiency and analytics capabilities

Technical Skills (By Category)

Programming Languages:

  • Required: Python (required) PySpark (required)

  • Preferred: Scala Java

Databases & Data Management:

  • SQL (MySQL SQL Server) NoSQL (Cassandra MongoDB) Hive Data Lakes

Cloud Technologies:

  • Azure Data Factory Azure Synapse AWS Glue S3 (preferred)

Frameworks & Libraries:

  • Spark MLlib Pandas NumPy seaborn matplotlib scikit-learn (preferred)

Development Tools & Methodologies:

  • Jupyter PyCharm VSCode Git CI/CD (Jenkins GitHub Actions) Airflow

Security & Data Governance:

  • Data privacy principles secure data ingestion and output compliance

Experience Requirements

  • 7-12 years of experience in data engineering analytics or data science roles with significant hands-on experience in big data processing and ML pipelines

  • Proven track record of building scalable data pipelines and supporting ML workflows in enterprise environments

  • Experience working with structured semi-structured and unstructured data across financial domains

  • Previous leadership or mentorship experience in a technical team is preferred

Day-to-Day Activities

  • Develop and optimize data pipelines for financial and index data using PySpark and related tools

  • Build ML workflows feature engineering and model deployment pipelines in both streaming and batch environments

  • Collaborate with business analysts and data scientists to refine data requirements and deliver insights

  • Automate data ingestion transformation and validation processes

  • Monitor system performance troubleshoot issues and implement tuning activities

  • Review code and pipeline health with peer teams uphold best practices in software development and data security

Qualifications

  • Bachelors or Masters degree in Computer Science Data Science Mathematics or a related field

  • Relevant certifications in big data cloud platforms or analytics (preferred)

  • Strong portfolio showcasing data pipeline projects analytics solutions and ML workflows

Professional Competencies

  • Critical thinking and analytical problem-solving skills

  • Excellent communication skills for technical and non-technical audiences

  • Leadership qualities to guide project execution and mentor junior team members

  • Adaptability to new tools frameworks and evolving project requirements

  • Ability to handle multiple priorities under pressure with a focus on quality and deadlines

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.


All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice


Required Experience:

IC

Job SummarySynechron is seeking an experienced PySpark Data Engineer / Data Scientist to lead data pipeline development and advanced analytics initiatives within our financial data and index analytics division. This role plays a crucial part in building scalable data processing solutions enabling da...

About Company

Company Logo

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile