Data Engineer – Cloud, SparkScala, ETL & AI Integration

Mumbai - India

Monthly Salary: Not Disclosed

Posted on: 23 hours ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking a skilled ETL Developer with strong expertise in Hadoop ecosystems Spark and Informatica to design develop and maintain scalable data pipelines supporting enterprise analytics and data warehousing initiatives. This role involves working on large datasets transforming data and delivering reliable data integration solutions across on-premise and cloud environments. Your efforts will enable data-driven decision-making ensure data quality and support our organizations strategic focus on scalable and compliant data platforms.

Software Requirements

Required:

Hands-on experience with ETL tools: Informatica Talend or equivalent (5 years)
Proven expertise in Hadoop ecosystem components: HDFS Hive Pig Sqoop (5 years)
Proficiency in Apache Spark: PySpark Spark SQL Spark Streaming
Strong programming skills in Python Java or Scala for data processing (5 years)
Experience with SQL and relational databases: Oracle MySQL PostgreSQL
Familiarity with cloud data platforms such as AWS Redshift Azure Synapse GCP BigQuery

Preferred:

Knowledge of cloud-native data migration and integration tools
Exposure to NoSQL databases like DynamoDB or Cassandra
Experience with data governance and metadata management tools

Overall Responsibilities

Design develop and optimize end-to-end ETL pipelines for large-scale data processing and integrations
Build and enhance batch and real-time data processing workflows using Spark Hadoop and cloud services
Convert business and technical requirements into high-performance data solutions aligned with governance standards
Perform performance tuning debugging and optimization of data workflows and processing jobs
Ensure data quality security and compliance with enterprise standards and industry regulations
Collaborate with data analysts data scientists and application teams to maximize data usability and accuracy
Automate data ingestion transformation and deployment pipelines for operational efficiency
Support platform stability by troubleshooting issues monitoring workflows and maintaining data lineage
Implement and improve data governance metadata management and security standards
Stay current with emerging data technologies automation frameworks and cloud innovations to optimize data architectures

Technical Skills (By Category)

Programming Languages (Essential):

Python Scala Java (for data processing and automation)

Preferred:

Additional scripting or programming skills (Shell SQL scripting)

Frameworks & Libraries:

Spark (PySpark Spark SQL Spark Streaming) Hive Pig
Data validation and governance tools (e.g. Atlas Data Catalogs)
AI/ML frameworks such as LangChain Hugging Face (preferred)

Databases & Storage:

Relational: Oracle PostgreSQL MySQL
NoSQL: DynamoDB Cassandra (preferred)

Cloud Technologies:

AWS: EMR S3 Glue CloudFormation CDK Redshift (preferred)
Azure or GCP data services (desired)

Data Management & Governance:

Metadata management data lineage data quality frameworks

DevOps & Automation:

CI/CD tools: Jenkins GitHub Actions TeamCity
Infrastructure as Code: Terraform CloudFormation Ansible

Experience Requirements

4 years of experience in designing and developing large-scale data pipelines
Proven expertise with Hadoop Spark and ETL frameworks in enterprise environments
Hands-on experience integrating data within cloud ecosystems and maintaining data quality
Familiarity with regulated industries such as finance or banking is preferred
Demonstrated ability to troubleshoot performance issues and optimize workflows

Day-to-Day Activities

Develop and maintain data pipelines supporting enterprise analytics and reporting
Optimize ETL workflows for performance scalability and data accuracy
Collaborate across teams to understand data requirements and implement technical solutions
Automate data processes and manage infrastructure provisioning using IaC tools
Monitor data processing jobs troubleshoot incidents and perform root cause analysis
Maintain documentation for data lineage workflow configurations and data security
Support migration and platform upgrade projects ensuring minimal disruption
Stay updated on new data processing tools cloud architecture and compliance standards

Qualifications

Bachelors or Masters degree in Computer Science Data Engineering or related field
4 years managing large-scale data pipelines preferably in cloud environments
Experience with Hadoop ecosystem Spark and ETL tools in enterprise settings
Certifications such as AWS Data Analytics Cloudera or relevant data platform certifications are advantageous

Professional Competencies

Strong analytical and troubleshooting skills in data processing contexts
Excellent collaboration and stakeholder management skills
Ability to work independently under deadlines and prioritize tasks effectively
Continuous learning mindset around emerging data cloud and AI/ML technologies
Focus on data quality security and scalability to meet industry standards

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Job SummarySynechron is seeking a skilled ETL Developer with strong expertise in Hadoop ecosystems Spark and Informatica to design develop and maintain scalable data pipelines supporting enterprise analytics and data warehousing initiatives. This role involves working on large datasets transforming ...

Job Summary

Software Requirements

Required:

Hands-on experience with ETL tools: Informatica Talend or equivalent (5 years)
Proven expertise in Hadoop ecosystem components: HDFS Hive Pig Sqoop (5 years)
Proficiency in Apache Spark: PySpark Spark SQL Spark Streaming
Strong programming skills in Python Java or Scala for data processing (5 years)
Experience with SQL and relational databases: Oracle MySQL PostgreSQL
Familiarity with cloud data platforms such as AWS Redshift Azure Synapse GCP BigQuery

Preferred:

Knowledge of cloud-native data migration and integration tools
Exposure to NoSQL databases like DynamoDB or Cassandra
Experience with data governance and metadata management tools

Overall Responsibilities

Design develop and optimize end-to-end ETL pipelines for large-scale data processing and integrations
Build and enhance batch and real-time data processing workflows using Spark Hadoop and cloud services
Convert business and technical requirements into high-performance data solutions aligned with governance standards
Perform performance tuning debugging and optimization of data workflows and processing jobs
Ensure data quality security and compliance with enterprise standards and industry regulations
Collaborate with data analysts data scientists and application teams to maximize data usability and accuracy
Automate data ingestion transformation and deployment pipelines for operational efficiency
Support platform stability by troubleshooting issues monitoring workflows and maintaining data lineage
Implement and improve data governance metadata management and security standards
Stay current with emerging data technologies automation frameworks and cloud innovations to optimize data architectures

Technical Skills (By Category)

Programming Languages (Essential):

Python Scala Java (for data processing and automation)

Preferred:

Additional scripting or programming skills (Shell SQL scripting)

Frameworks & Libraries:

Spark (PySpark Spark SQL Spark Streaming) Hive Pig
Data validation and governance tools (e.g. Atlas Data Catalogs)
AI/ML frameworks such as LangChain Hugging Face (preferred)

Databases & Storage:

Relational: Oracle PostgreSQL MySQL
NoSQL: DynamoDB Cassandra (preferred)

Cloud Technologies:

AWS: EMR S3 Glue CloudFormation CDK Redshift (preferred)
Azure or GCP data services (desired)

Data Management & Governance:

Metadata management data lineage data quality frameworks

DevOps & Automation:

CI/CD tools: Jenkins GitHub Actions TeamCity
Infrastructure as Code: Terraform CloudFormation Ansible

Experience Requirements

4 years of experience in designing and developing large-scale data pipelines
Proven expertise with Hadoop Spark and ETL frameworks in enterprise environments
Hands-on experience integrating data within cloud ecosystems and maintaining data quality
Familiarity with regulated industries such as finance or banking is preferred
Demonstrated ability to troubleshoot performance issues and optimize workflows

Day-to-Day Activities

Develop and maintain data pipelines supporting enterprise analytics and reporting
Optimize ETL workflows for performance scalability and data accuracy
Collaborate across teams to understand data requirements and implement technical solutions
Automate data processes and manage infrastructure provisioning using IaC tools
Monitor data processing jobs troubleshoot incidents and perform root cause analysis
Maintain documentation for data lineage workflow configurations and data security
Support migration and platform upgrade projects ensuring minimal disruption
Stay updated on new data processing tools cloud architecture and compliance standards

Qualifications

Bachelors or Masters degree in Computer Science Data Engineering or related field
4 years managing large-scale data pipelines preferably in cloud environments
Experience with Hadoop ecosystem Spark and ETL tools in enterprise settings
Certifications such as AWS Data Analytics Cloudera or relevant data platform certifications are advantageous

Professional Competencies

Strong analytical and troubleshooting skills in data processing contexts
Excellent collaboration and stakeholder management skills
Ability to work independently under deadlines and prioritize tasks effectively
Continuous learning mindset around emerging data cloud and AI/ML technologies
Focus on data quality security and scalability to meet industry standards

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Candidate Application Notice

Required Experience:

Key Skills

APIs
Jenkins
REST
Python
SOAP
Systems Engineering
Service-Oriented Architecture
Java
XML
JSON
Scripting
Sftp

Apply Now

About Company

Synechron

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click