drjobs Data Engineer

Data Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

We are implementing a Media Mix Optimization (MMO) platform designed to analyze and optimize marketing investments across multiple channels. This initiative requires a robust on-premises data infrastructure to support distributed computing large-scale data ingestion and advanced analytics. The Data Engineer will be responsible for building and maintaining resilient pipelines and data systems that feed into MMO models ensuring data quality governance and availability for Data Science and BI teams. The environment integrates HDFS for distributed storage Apache NiFi for orchestration Hive and PySpark for distributed processing and Postgres for structured data management.

This role is central to enabling seamless integration of massive datasets from disparate sources (media campaign transaction customer interaction etc.) standardizing data and providing reliable foundations for advanced econometric modeling and insights.

Responsibilities:

 

Data Pipeline Development & Orchestration
o Design build and optimize scalable data pipelines in Apache NiFi to

automate ingestion cleansing and enrichment from structured semi-structured and unstructured sources.

Ensure pipelines meet low-latency and high-throughput requirements for distributed processing.

Data Storage & Processing
o Architect and manage datasets on HDFS to support high-volume

fault-tolerant storage.
o Develop distributed processing workflows in PySpark and Hive to

handle large-scale transformations aggregations and joins across

petabyte-level datasets.
o Implement partitioning bucketing and indexing strategies to

optimize query performance.

Database Engineering & Management
o Maintain and tune Postgres databases for high availability integrity

and performance.
o Write advanced SQL queries for ETL analysis and integration with

downstream BI/analytics systems.

Collaboration & Integration
o Partner with Data Scientists to deliver clean reliable datasets for

model training and MMO analysis.
o Work with BI engineers to ensure data pipelines align with reporting

and visualization requirements.

Monitoring & Reliability Engineering
o Implement monitoring logging and alerting frameworks to track

data pipeline health.
o Troubleshoot and resolve issues in ingestion transformations and

distributed jobs.

Data Governance & Compliance
o Enforce standards for data quality lineage and security across

systems.
o Ensure compliance with internal governance and external

regulations.

Documentation & Knowledge Transfer
o Develop and maintain comprehensive technical documentation for

pipelines data models and workflows.
o Provide knowledge sharing and onboarding support for cross-

functional teams.

 


Qualifications :

  • Bachelors degree in Computer Science Information Technology or related field (Masters preferred).

  • Proven experience as a Data Engineer with expertise in HDFS Apache NiFi Hive PySpark Postgres Python and SQL.

  • Strong background in ETL/ELT design distributed processing and relational database management.

  • Experience with on-premises big data ecosystems supporting distributed computing.

  • Solid debugging optimization and performance tuning skills.

  • Ability to work in agile environments collaborating with multi-disciplinary

    teams.

  • Strong communication skills for cross-functional technical discussions.

    Preferred Qualifications:

  • Familiarity with data governance frameworks lineage tracking and data cataloging tools.

  • Knowledge of security standards encryption and access control in on- premises environments.

  • Prior experience with Media Mix Modeling (MMM/MMO) or marketing analytics projects.

  • Exposure to workflow schedulers (Airflow Oozie or similar).

  • Proficiency in developing automation scripts and frameworks in Python for

    CI/CD of data pipelines.


Remote Work :

Yes


Employment Type :

Full-time

Employment Type

Remote

Company Industry

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.