drjobs Data Engineer

Data Engineer

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Reston, VA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Role: Data Engineer

We are seeking a highly skilled Data Engineer to set up Change Data Capture (CDC) for multiple database types to support data lake hydration. The ideal candidate should have handson experience with Debezium or other CDC frameworks and strong expertise in ETL transformations using Apache Spark for both streaming and batch data processing.

Key Responsibilities:

  • Implement Change Data Capture (CDC) for diverse databases to enable realtime and batch data ingestion.
  • Develop ETL pipelines using Apache Spark (PySpark/Java) to transform raw CDC data into structured analyticsready datasets.
  • Work with Apache Spark DataFrames Spark SQL and Spark Streaming to build scalable data pipelines.
  • Optimize data workflows for performance reliability and scalability in a big data environment.
  • Utilize Apache Airflow to orchestrate data pipelines and schedule workflows.
  • Leverage AWS services for data ingestion storage transformation and processing (e.g. S3 Glue EMR Lambda Step Functions MWAA).

Required Skills:

  • Java: Mid to seniorlevel experience.
  • Python (PySpark): Midlevel experience.
  • Apache Spark: Proficiency in DataFrames Spark SQL Spark Streaming and ETL pipelines.
  • Apache Airflow: Experience managing and scheduling workflows.
  • AWS Expertise:
    • S3 (CRUD operations)
    • EMR & EMR Serverless
    • Glue Data Catalog
    • Step Functions
    • MWAA (Managed Workflows for Apache Airflow)
    • AWS Lambda (Pythonbased)
    • AWS Batch

NicetoHave Skills (Bonus):

  • Scala for Spark development.
  • Apache Hudi for incremental data processing and ACID transactions.
  • Apache Griffin for data quality and validation.
  • Performance tuning and optimization in big data environments.
  • AWS Deequ not required but a plus

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.