Senior Data Engineer

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Senior Data Engineer

Req number:

R6655

Employment type:

Full time

Worksite flexibility:

Remote

Who we are

CAI is a global technology services firm with over 8500 associates worldwide and a yearly revenue of $1 billion. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients colleagues and communities. As a privately held company we have the freedom and focus to do what is rightwhatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors and we are trailblazers in bringing neurodiversity to the enterprise.

Job Summary

We are looking for a motivated Data Engineer ready to take us to the next level! If you have strong experience in building cloud-based data lake and analytics architectures using AWS and Databricks and is proficient in Python programming for data processing and automation and are looking for your next career move apply now.

Job Description

We are looking for a Data Engineer that has experience in building data products using Databricks and related technologies. This position will be Full-time and Remote position.

What Youll Do

Design develop and maintain data lakes and data pipelines on AWS using ETL frameworks and Databricks.
Integrate and transform large-scale data from multiple heterogeneous sources into a centralized data lake environment.
Implement and manage Delta Lake architecture using Databricks Delta or Apache Hudi.
Develop end-to-end data workflows using PySpark Databricks Notebooks and Python scripts for ingestion transformation and enrichment.
Design and develop data warehouses and data marts for analytical workloads using Snowflake Redshift or similar systems.
Design and evaluate data models (Star Snowflake Flattened) for analytical and transactional systems.
Optimize data storage query performance and cost across the AWS and Databricks ecosystem.
Build and maintain CI/CD pipelines for Databricks notebooks jobs and Python-based data processing scripts.
Collaborate with data scientists analysts and stakeholders to deliver high-performance reusable data assets.
Maintain and manage code repositories (Git) and promote best practices in version control testing and deployment.
Participate in making major technical and architectural decisions for data engineering initiatives.
Monitor and troubleshoot Databricks clusters Spark jobs and ETL processes for performance and reliability.
Coordinate with business and technical teams through all phases of the software development life cycle.

What Youll Need

Required

5 years of experience building and managing Data Lake Architecture on AWS Cloud
3 years of experience with AWS Data services such as S3 Glue Lake Formation EMR Kinesis RDS DMS and Redshift.
3 years of experience building Data Warehouses on Snowflake Redshift HANA Teradata or Exasol.
3 years of hands-on experience working with Apache Spark or PySpark on Databricks.
3 years of experience implementing Delta Lakes using Databricks Delta or Apache Hudi.
3 years of experience in ETL development using Databricks AWS Glue or other modern frameworks.
Proficiency in Python for data engineering automation and API integrations.
Experience in Databricks Jobs Workflows and Cluster Management.
Experience with CI/CD pipelines and Infrastructure as Code (IaC) tools like Terraform or CloudFormation is a plus.
Bachelors degree in computer science Information Technology Data Science or related field.
Experience working on Agile projects and methodology in general.

Preferred

Strong SQL RDBMS and data modeling skills.
Experience with Databricks Unity Catalog Delta Live Tables (DLT) and MLflow for data governance and model lifecycle.
AWS or Databricks Cloud Certifications (e.g. AWS Data Analytics Specialty Databricks Certified Data Engineer Professional) are a big plus.
Understanding data security access control and compliance in cloud environments.
Strong analytical problem-solving and communication skills.

Physical Demands

This role involves mostly sedentary work with occasional movement around the office to attend meetings etc.
Ability to perform repetitive tasks on a computer using a mouse keyboard and monitor.

Reasonable accommodation statement

If you require a reasonable accommodation in completing this application interviewing completing any pre-employment testing or otherwise participating in the employment selection process please direct your inquiries to or (888).

Required Experience:

Senior IC

Senior Data EngineerReq number:R6655Employment type:Full timeWorksite flexibility:RemoteWho we areCAI is a global technology services firm with over 8500 associates worldwide and a yearly revenue of $1 billion. We have over 40 years of excellence in uniting talent and technology to power the possibl...

Senior Data Engineer

Req number:

R6655

Employment type:

Full time

Worksite flexibility:

Remote

Who we are

Job Summary

Job Description

We are looking for a Data Engineer that has experience in building data products using Databricks and related technologies. This position will be Full-time and Remote position.

What Youll Do

Design develop and maintain data lakes and data pipelines on AWS using ETL frameworks and Databricks.
Integrate and transform large-scale data from multiple heterogeneous sources into a centralized data lake environment.
Implement and manage Delta Lake architecture using Databricks Delta or Apache Hudi.
Develop end-to-end data workflows using PySpark Databricks Notebooks and Python scripts for ingestion transformation and enrichment.
Design and develop data warehouses and data marts for analytical workloads using Snowflake Redshift or similar systems.
Design and evaluate data models (Star Snowflake Flattened) for analytical and transactional systems.
Optimize data storage query performance and cost across the AWS and Databricks ecosystem.
Build and maintain CI/CD pipelines for Databricks notebooks jobs and Python-based data processing scripts.
Collaborate with data scientists analysts and stakeholders to deliver high-performance reusable data assets.
Maintain and manage code repositories (Git) and promote best practices in version control testing and deployment.
Participate in making major technical and architectural decisions for data engineering initiatives.
Monitor and troubleshoot Databricks clusters Spark jobs and ETL processes for performance and reliability.
Coordinate with business and technical teams through all phases of the software development life cycle.

What Youll Need

Required

5 years of experience building and managing Data Lake Architecture on AWS Cloud
3 years of experience with AWS Data services such as S3 Glue Lake Formation EMR Kinesis RDS DMS and Redshift.
3 years of experience building Data Warehouses on Snowflake Redshift HANA Teradata or Exasol.
3 years of hands-on experience working with Apache Spark or PySpark on Databricks.
3 years of experience implementing Delta Lakes using Databricks Delta or Apache Hudi.
3 years of experience in ETL development using Databricks AWS Glue or other modern frameworks.
Proficiency in Python for data engineering automation and API integrations.
Experience in Databricks Jobs Workflows and Cluster Management.
Experience with CI/CD pipelines and Infrastructure as Code (IaC) tools like Terraform or CloudFormation is a plus.
Bachelors degree in computer science Information Technology Data Science or related field.
Experience working on Agile projects and methodology in general.

Preferred

Strong SQL RDBMS and data modeling skills.
Experience with Databricks Unity Catalog Delta Live Tables (DLT) and MLflow for data governance and model lifecycle.
AWS or Databricks Cloud Certifications (e.g. AWS Data Analytics Specialty Databricks Certified Data Engineer Professional) are a big plus.
Understanding data security access control and compliance in cloud environments.
Strong analytical problem-solving and communication skills.

Physical Demands

This role involves mostly sedentary work with occasional movement around the office to attend meetings etc.
Ability to perform repetitive tasks on a computer using a mouse keyboard and monitor.

Reasonable accommodation statement

Required Experience:

Senior IC

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

CAI

CAI helps organizations leverage technology, people, and processes to solve business problems, enable savings, and spur innovation.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Data Engineer

Bengaluru - India

Job Summary

Who we are

We are looking for a Data Engineer that has experience in building data products using Databricks and related technologies. This position will be Full-time and Remote position.

What Youll Do

Design develop and maintain data lakes and data pipelines on AWS using ETL frameworks and Databricks.

Integrate and transform large-scale data from multiple heterogeneous sources into a centralized data lake environment.

Implement and manage Delta Lake architecture using Databricks Delta or Apache Hudi.

Develop end-to-end data workflows using PySpark Databricks Notebooks and Python scripts for ingestion transformation and enrichment.

Design and develop data warehouses and data marts for analytical workloads using Snowflake Redshift or similar systems.

Design and evaluate data models (Star Snowflake Flattened) for analytical and transactional systems.

Optimize data storage query performance and cost across the AWS and Databricks ecosystem.

Build and maintain CI/CD pipelines for Databricks notebooks jobs and Python-based data processing scripts.

Collaborate with data scientists analysts and stakeholders to deliver high-performance reusable data assets.

Maintain and manage code repositories (Git) and promote best practices in version control testing and deployment.

Participate in making major technical and architectural decisions for data engineering initiatives.

Monitor and troubleshoot Databricks clusters Spark jobs and ETL processes for performance and reliability.

Coordinate with business and technical teams through all phases of the software development life cycle.

What Youll Need

Required

5 years of experience building and managing Data Lake Architecture on AWS Cloud

3 years of experience with AWS Data services such as S3 Glue Lake Formation EMR Kinesis RDS DMS and Redshift.

3 years of experience building Data Warehouses on Snowflake Redshift HANA Teradata or Exasol.

3 years of hands-on experience working with Apache Spark or PySpark on Databricks.

3 years of experience implementing Delta Lakes using Databricks Delta or Apache Hudi.

3 years of experience in ETL development using Databricks AWS Glue or other modern frameworks.

Proficiency in Python for data engineering automation and API integrations.

Experience in Databricks Jobs Workflows and Cluster Management.

Experience with CI/CD pipelines and Infrastructure as Code (IaC) tools like Terraform or CloudFormation is a plus.

Bachelors degree in computer science Information Technology Data Science or related field.

Experience working on Agile projects and methodology in general.

Preferred

Strong SQL RDBMS and data modeling skills.

Experience with Databricks Unity Catalog Delta Live Tables (DLT) and MLflow for data governance and model lifecycle.

AWS or Databricks Cloud Certifications (e.g. AWS Data Analytics Specialty Databricks Certified Data Engineer Professional) are a big plus.

Understanding data security access control and compliance in cloud environments.

Strong analytical problem-solving and communication skills.

Physical Demands

This role involves mostly sedentary work with occasional movement around the office to attend meetings etc.

Ability to perform repetitive tasks on a computer using a mouse keyboard and monitor.

Who we are

We are looking for a Data Engineer that has experience in building data products using Databricks and related technologies. This position will be Full-time and Remote position.

What Youll Do

Design develop and maintain data lakes and data pipelines on AWS using ETL frameworks and Databricks.

Integrate and transform large-scale data from multiple heterogeneous sources into a centralized data lake environment.

Implement and manage Delta Lake architecture using Databricks Delta or Apache Hudi.

Develop end-to-end data workflows using PySpark Databricks Notebooks and Python scripts for ingestion transformation and enrichment.

Design and develop data warehouses and data marts for analytical workloads using Snowflake Redshift or similar systems.

Design and evaluate data models (Star Snowflake Flattened) for analytical and transactional systems.

Optimize data storage query performance and cost across the AWS and Databricks ecosystem.

Build and maintain CI/CD pipelines for Databricks notebooks jobs and Python-based data processing scripts.

Collaborate with data scientists analysts and stakeholders to deliver high-performance reusable data assets.

Maintain and manage code repositories (Git) and promote best practices in version control testing and deployment.

Participate in making major technical and architectural decisions for data engineering initiatives.

Monitor and troubleshoot Databricks clusters Spark jobs and ETL processes for performance and reliability.

Coordinate with business and technical teams through all phases of the software development life cycle.

What Youll Need

Required

5 years of experience building and managing Data Lake Architecture on AWS Cloud

3 years of experience with AWS Data services such as S3 Glue Lake Formation EMR Kinesis RDS DMS and Redshift.

3 years of experience building Data Warehouses on Snowflake Redshift HANA Teradata or Exasol.

3 years of hands-on experience working with Apache Spark or PySpark on Databricks.

3 years of experience implementing Delta Lakes using Databricks Delta or Apache Hudi.

3 years of experience in ETL development using Databricks AWS Glue or other modern frameworks.

Proficiency in Python for data engineering automation and API integrations.

Experience in Databricks Jobs Workflows and Cluster Management.

Experience with CI/CD pipelines and Infrastructure as Code (IaC) tools like Terraform or CloudFormation is a plus.

Bachelors degree in computer science Information Technology Data Science or related field.

Experience working on Agile projects and methodology in general.

Preferred

Strong SQL RDBMS and data modeling skills.

Experience with Databricks Unity Catalog Delta Live Tables (DLT) and MLflow for data governance and model lifecycle.

AWS or Databricks Cloud Certifications (e.g. AWS Data Analytics Specialty Databricks Certified Data Engineer Professional) are a big plus.

Understanding data security access control and compliance in cloud environments.

Strong analytical problem-solving and communication skills.

Physical Demands

This role involves mostly sedentary work with occasional movement around the office to attend meetings etc.

Ability to perform repetitive tasks on a computer using a mouse keyboard and monitor.

Key Skills