Data Engineer

Hyderabad - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Summary

We are seeking a skilled Data Engineer to design build and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems collaborating with cross-functional teams and ensuring data reliability security and performance across the data lifecycle.

Key Responsibilities

ETL Pipeline Development & Optimization

Design develop and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.

Optimize data pipelines for performance scalability fault tolerance and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions usingApache Spark (PySpark/Scala)andApache Kafka.

Ensure fault-tolerant scalable and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable cloud-native data infrastructure onAWS.

Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources ( MSK).

Ensure consistency data quality and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.

Perform in-depth data analysis to identify trends patterns and anomalies.

Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines usingJenkinsor similar tools.

Automate testing deployment and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g. GDPR HIPAA).

Implement data governance practices ensuring data integrity security and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.

Apply best practices for monitoring tuning and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers data scientists product managers and business stakeholders.

Participate in agile ceremonies sprint planning and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such asEMR Managed Apache Airflow Glue S3 DMS MSK and EC2.

Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency inPySpark or Scala SparkandSQLfor large-scale data transformation and analysis.

Experience withApache SparkandApache Kafkain production environments.

Data Frameworks

Strong knowledge ofSpark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for bothbatch and real-timedata processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models forOLAP and OLTPsystems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.

Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience withCI/CD pipelinesusingJenkinsor similar tools.

Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge ofJavafor data processing applications.

Experience withNoSQL databases(e.g. DynamoDB Cassandra MongoDB).

Familiarity withdata governance frameworksand compliance tooling.

Experience with monitoring and observability tools such asAWS CloudWatch Splunk or Dynatrace.

Exposure to cost optimization strategies for large-scale cloud data platforms.

Required Skills:

SPLUNKAWSSparkOLTP