Job Summary
We are seeking a skilled Data Engineer to design build and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems collaborating with cross-functional teams and ensuring data reliability security and performance across the data lifecycle.
Key Responsibilities
ETL Pipeline Development & Optimization
Design develop and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance scalability fault tolerance and reliability.
Big Data Processing
Develop and optimize batch and real-time data processing solutions usingApache Spark (PySpark/Scala)andApache Kafka.
Ensure fault-tolerant scalable and high-performance data processing systems.
Cloud Infrastructure Development
Build and manage scalable cloud-native data infrastructure onAWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.
Real-Time & Batch Data Integration
Enable seamless ingestion and processing of real-time streaming and batch data sources ( MSK).
Ensure consistency data quality and a unified view across multiple data sources and formats.
Data Analysis & Insights
Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends patterns and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.
CI/CD & Automation
Implement and maintain CI/CD pipelines usingJenkinsor similar tools.
Automate testing deployment and monitoring to ensure smooth production releases.
Data Security & Compliance
Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g. GDPR HIPAA).
Implement data governance practices ensuring data integrity security and traceability.
Troubleshooting & Performance Tuning
Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring tuning and optimizing data ingestion and storage.
Collaboration & Cross-Functional Work
Work closely with engineers data scientists product managers and business stakeholders.
Participate in agile ceremonies sprint planning and architectural discussions.
Skills & Qualifications
Mandatory (Must-Have) Skills
AWS Expertise
Hands-on experience with AWS Big Data services such asEMR Managed Apache Airflow Glue S3 DMS MSK and EC2.
Strong understanding of cloud-native data architectures.
Big Data Technologies
Proficiency inPySpark or Scala SparkandSQLfor large-scale data transformation and analysis.
Experience withApache SparkandApache Kafkain production environments.
Data Frameworks
Strong knowledge ofSpark DataFrames and Datasets.
ETL Pipeline Development
Proven experience in building scalable and reliable ETL pipelines for bothbatch and real-timedata processing.
Database Modeling & Data Warehousing
Expertise in designing scalable data models forOLAP and OLTPsystems.
Data Analysis & Insights
Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.
CI/CD & Automation
Basic to intermediate experience withCI/CD pipelinesusingJenkinsor similar tools.
Familiarity with automated testing and deployment workflows.
Good-to-Have (Preferred) Skills
Knowledge ofJavafor data processing applications.
Experience withNoSQL databases(e.g. DynamoDB Cassandra MongoDB).
Familiarity withdata governance frameworksand compliance tooling.
Experience with monitoring and observability tools such asAWS CloudWatch Splunk or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.
Required Skills:
SPLUNKAWSSparkOLTP
Job SummaryWe are seeking a skilled Data Engineer to design build and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems collaborating with cross-functional teams and ensuring data reliability securi...
Job Summary
We are seeking a skilled Data Engineer to design build and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems collaborating with cross-functional teams and ensuring data reliability security and performance across the data lifecycle.
Key Responsibilities
ETL Pipeline Development & Optimization
Design develop and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance scalability fault tolerance and reliability.
Big Data Processing
Develop and optimize batch and real-time data processing solutions usingApache Spark (PySpark/Scala)andApache Kafka.
Ensure fault-tolerant scalable and high-performance data processing systems.
Cloud Infrastructure Development
Build and manage scalable cloud-native data infrastructure onAWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.
Real-Time & Batch Data Integration
Enable seamless ingestion and processing of real-time streaming and batch data sources ( MSK).
Ensure consistency data quality and a unified view across multiple data sources and formats.
Data Analysis & Insights
Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends patterns and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.
CI/CD & Automation
Implement and maintain CI/CD pipelines usingJenkinsor similar tools.
Automate testing deployment and monitoring to ensure smooth production releases.
Data Security & Compliance
Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g. GDPR HIPAA).
Implement data governance practices ensuring data integrity security and traceability.
Troubleshooting & Performance Tuning
Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring tuning and optimizing data ingestion and storage.
Collaboration & Cross-Functional Work
Work closely with engineers data scientists product managers and business stakeholders.
Participate in agile ceremonies sprint planning and architectural discussions.
Skills & Qualifications
Mandatory (Must-Have) Skills
AWS Expertise
Hands-on experience with AWS Big Data services such asEMR Managed Apache Airflow Glue S3 DMS MSK and EC2.
Strong understanding of cloud-native data architectures.
Big Data Technologies
Proficiency inPySpark or Scala SparkandSQLfor large-scale data transformation and analysis.
Experience withApache SparkandApache Kafkain production environments.
Data Frameworks
Strong knowledge ofSpark DataFrames and Datasets.
ETL Pipeline Development
Proven experience in building scalable and reliable ETL pipelines for bothbatch and real-timedata processing.
Database Modeling & Data Warehousing
Expertise in designing scalable data models forOLAP and OLTPsystems.
Data Analysis & Insights
Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.
CI/CD & Automation
Basic to intermediate experience withCI/CD pipelinesusingJenkinsor similar tools.
Familiarity with automated testing and deployment workflows.
Good-to-Have (Preferred) Skills
Knowledge ofJavafor data processing applications.
Experience withNoSQL databases(e.g. DynamoDB Cassandra MongoDB).
Familiarity withdata governance frameworksand compliance tooling.
Experience with monitoring and observability tools such asAWS CloudWatch Splunk or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.
Required Skills:
SPLUNKAWSSparkOLTP
View more
View less