Data Engineer – Gen AI

QTech

Not Interested
Bookmark
Report This Job

profile Job Location:

Edison, NJ - USA

profile Monthly Salary: Not Disclosed
Posted on: 15-10-2025
Vacancies: 1 Vacancy

Job Summary

Job Title: Data Engineer Gen AI
Location: Edison NJ
Domain: IT Services
Duration: Long Term Contract
Looking for W2 Candidates. No C2C

Responsibilities:

  • Design build and maintain scalable data pipelines to support Generative AI and LLM-based applications.
  • Collect clean and preprocess structured and unstructured data for model training fine-tuning and retrieval-augmented generation (RAG).
  • Implement robust data ingestion frameworks integrating APIs streaming sources and external repositories.
  • Collaborate with AI/ML teams to deliver high-quality domain-specific datasets optimized for transformer-based architectures.
  • Architect and manage vector databases (e.g. Pinecone FAISS Weaviate) for efficient embedding storage and semantic search.
  • Optimize data storage retrieval and transformation workflows across multi-cloud and hybrid environments.
  • Automate data versioning lineage tracking and governance processes to ensure compliance and reproducibility.
  • Build scalable ETL/ELT frameworks and orchestrate workflows using Airflow Prefect or Dagster.
  • Contribute to prompt engineering and model evaluation pipelines through metadata enrichment and contextual data provisioning.
  • Ensure data quality privacy and ethical use standards across all Generative AI applications.

Qualifications:

  • 8 years of professional experience in Data Engineering; 2 years supporting AI/ML or Generative AI workflows.
  • Proficiency in Python SQL and distributed data processing frameworks (Spark Pyspark Dask).
  • Strong experience with data pipeline orchestration tools (Airflow Luigi Dagster or Prefect).
  • Hands-on experience with cloud data ecosystems such as AWS (Glue Redshift S3) Azure Data Factory or GCP BigQuery.
  • Knowledge of vector databases and embedding models for RAG-based systems.
  • Familiarity with Lang Chain LLMOps and data preparation for fine-tuning LLMs.
  • Experience in containerization and orchestration (Docker Kubernetes).
  • Working knowledge of API integration data governance and data cataloging tools (e.g. Data Hub Amundsen).
  • Exposure to Generative AI concepts such as embeddings tokenization and prompt optimization.
  • Understanding of Responsible AI practices data anonymization and bias mitigation techniques.

Best Regards
Tarun K
Phone: 1-
Email:

Job Title: Data Engineer Gen AI Location: Edison NJ Domain: IT Services Duration: Long Term Contract Looking for W2 Candidates. No C2C Responsibilities: Design build and maintain scalable data pipelines to support Generative AI and LLM-based applications. Collect clean and preprocess structure...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala