Cloud Data Engineer – Python, SparkScala, AWS & Data Warehousing

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

Synechron is seeking a highly skilled Data Engineer to lead the design development and optimization of enterprise data pipelines and analytics solutions. This role involves working on big data platforms leveraging cloud-native AWS services and integrating AI-driven applications to support business intelligence and operational excellence. You will develop scalable secure and high-performance data architectures working closely with various stakeholders to meet data governance compliance requirements and strategic objectives.

Software Requirements

Required:

Hands-on experience with Python PySpark and Scala for building scalable data processing jobs (4 years)
Expertise in big data platforms such as Spark Hadoop Hive and related processing frameworks
Proven experience in end-to-end ETL pipeline development including ingestion transformation and data validation
Strong familiarity with cloud data ecosystems on AWS including EMR S3 Glue CloudFormation CDK and Data Pipeline
Experience with relational databases: PostgreSQL Oracle SQL Server
Experience working with NoSQL databases like DynamoDB or Cassandra (preferred)
Working knowledge of metadata management data lineage and data governance tools

Preferred:

Exposure to AI/ML applications especially with Generative AI and frameworks like LangChain Llama or Hugging Face
Familiarity with model management and prompt engineering (preferred)
Knowledge of containerization (Docker) and Kubernetes for scalable deployment

Overall Responsibilities

Design develop and support data pipelines and architectures to enable enterprise analytics reporting and data governance
Build scalable batch and streaming data workflows supporting business-critical functions
Optimize performance latency and throughput of data processing jobs through profiling and tuning
Implement security privacy and regulatory standards in data pipelines and repositories
Collaborate with data scientists BI teams and applications teams to ensure data quality and availability
Automate data ingestion transformation and deployment processes for operational efficiency
Ensure high system reliability and availability supporting infrastructure and platform monitoring
Lead or contribute to cloud migration and data modernization initiatives
Stay updated on emerging data technologies automation and AI/ML advancements to incorporate into existing platforms

Technical Skills (By Category)

Programming Languages (Essential):

Python Scala PySpark for big data processing

Preferred:

Additional scripting languages such as Shell or Bash for automation

Frameworks & Libraries:

Spark Hadoop ecosystem (Hive Kafka preferred)
Data validation lineage and governance tools
AI/ML frameworks such as LangChain Hugging Face (preferred)

Databases & Data Storage:

Relational: PostgreSQL Oracle SQL Server
NoSQL: DynamoDB Cassandra (preferred)

Cloud Technologies:

AWS: EMR S3 Glue CloudFormation CDK Lambda Data Pipeline CloudWatch

Data Governance & Security:

Metadata management data lineage security best practices and compliance standards (PCI GDPR)

Experience Requirements

4 years designing and implementing enterprise data pipelines in cloud environments
Proven experience working with big data frameworks and AWS cloud solutions
Strong understanding of data governance security standards and regulatory compliance in enterprise contexts
Experience integrating AI/ML solutions or supporting AI workflows (preferred)
Past involvement in data migration modernization or platform automation projects

Day-to-Day Activities

Architect develop and optimize complex data pipelines and architectures supporting enterprise analytics
Implement data ingestion transformation and validation workflows across diverse data sources
Troubleshoot and resolve pipeline performance and security issues
Automate infrastructure provisioning deployment and management in cloud environments
Collaborate with data scientists BI teams and application developers to align data solutions with business goals
Monitor system health perform root cause analysis and implement efficiency improvements
Document data architecture lineage and governance procedures
Stay current on new tools frameworks and AI/ML innovations relevant for data engineering

Qualifications

Bachelors or Masters degree in Computer Science Data Engineering or a related field
4 years of experience in cloud data engineering big data processing and ETL development
Proven track record of successfully supporting large-scale secure and compliant data solutions
Relevant certifications such as AWS Data Analytics or Big Data certifications are advantageous

Professional Competencies

Strong analytical and troubleshooting capabilities for data processing and pipeline issues
Excellent collaboration and stakeholder management skills
Leadership qualities for guiding junior team members and influencing best practices
Ability to adapt to evolving industry standards tools and compliance regulations
Results-driven with a focus on data quality security and operational reliability

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.

All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.

Candidate Application Notice

Required Experience:

Job SummarySynechron is seeking a highly skilled Data Engineer to lead the design development and optimization of enterprise data pipelines and analytics solutions. This role involves working on big data platforms leveraging cloud-native AWS services and integrating AI-driven applications to support...

Job Summary

Software Requirements

Required:

Hands-on experience with Python PySpark and Scala for building scalable data processing jobs (4 years)
Expertise in big data platforms such as Spark Hadoop Hive and related processing frameworks
Proven experience in end-to-end ETL pipeline development including ingestion transformation and data validation
Strong familiarity with cloud data ecosystems on AWS including EMR S3 Glue CloudFormation CDK and Data Pipeline
Experience with relational databases: PostgreSQL Oracle SQL Server
Experience working with NoSQL databases like DynamoDB or Cassandra (preferred)
Working knowledge of metadata management data lineage and data governance tools

Preferred:

Exposure to AI/ML applications especially with Generative AI and frameworks like LangChain Llama or Hugging Face
Familiarity with model management and prompt engineering (preferred)
Knowledge of containerization (Docker) and Kubernetes for scalable deployment

Overall Responsibilities

Design develop and support data pipelines and architectures to enable enterprise analytics reporting and data governance
Build scalable batch and streaming data workflows supporting business-critical functions
Optimize performance latency and throughput of data processing jobs through profiling and tuning
Implement security privacy and regulatory standards in data pipelines and repositories
Collaborate with data scientists BI teams and applications teams to ensure data quality and availability
Automate data ingestion transformation and deployment processes for operational efficiency
Ensure high system reliability and availability supporting infrastructure and platform monitoring
Lead or contribute to cloud migration and data modernization initiatives
Stay updated on emerging data technologies automation and AI/ML advancements to incorporate into existing platforms

Technical Skills (By Category)

Programming Languages (Essential):

Python Scala PySpark for big data processing

Preferred:

Additional scripting languages such as Shell or Bash for automation

Frameworks & Libraries:

Spark Hadoop ecosystem (Hive Kafka preferred)
Data validation lineage and governance tools
AI/ML frameworks such as LangChain Hugging Face (preferred)

Databases & Data Storage:

Relational: PostgreSQL Oracle SQL Server
NoSQL: DynamoDB Cassandra (preferred)

Cloud Technologies:

AWS: EMR S3 Glue CloudFormation CDK Lambda Data Pipeline CloudWatch

Data Governance & Security:

Metadata management data lineage security best practices and compliance standards (PCI GDPR)

Experience Requirements

4 years designing and implementing enterprise data pipelines in cloud environments
Proven experience working with big data frameworks and AWS cloud solutions
Strong understanding of data governance security standards and regulatory compliance in enterprise contexts
Experience integrating AI/ML solutions or supporting AI workflows (preferred)
Past involvement in data migration modernization or platform automation projects

Day-to-Day Activities

Architect develop and optimize complex data pipelines and architectures supporting enterprise analytics
Implement data ingestion transformation and validation workflows across diverse data sources
Troubleshoot and resolve pipeline performance and security issues
Automate infrastructure provisioning deployment and management in cloud environments
Collaborate with data scientists BI teams and application developers to align data solutions with business goals
Monitor system health perform root cause analysis and implement efficiency improvements
Document data architecture lineage and governance procedures
Stay current on new tools frameworks and AI/ML innovations relevant for data engineering

Qualifications

Bachelors or Masters degree in Computer Science Data Engineering or a related field
4 years of experience in cloud data engineering big data processing and ETL development
Proven track record of successfully supporting large-scale secure and compliant data solutions
Relevant certifications such as AWS Data Analytics or Big Data certifications are advantageous

Professional Competencies

Strong analytical and troubleshooting capabilities for data processing and pipeline issues
Excellent collaboration and stakeholder management skills
Leadership qualities for guiding junior team members and influencing best practices
Ability to adapt to evolving industry standards tools and compliance regulations
Results-driven with a focus on data quality security and operational reliability

SYNECHRONS DIVERSITY & INCLUSION STATEMENT

Candidate Application Notice

Required Experience:

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Synechron

Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click