Job Title: Data Engineer Gen AI
Location: Edison NJ
Domain: IT Services
Duration: Long Term Contract
Looking for W2 Candidates. No C2C
Responsibilities:
- Design build and maintain scalable data pipelines to support Generative AI and LLM-based applications.
- Collect clean and preprocess structured and unstructured data for model training fine-tuning and retrieval-augmented generation (RAG).
- Implement robust data ingestion frameworks integrating APIs streaming sources and external repositories.
- Collaborate with AI/ML teams to deliver high-quality domain-specific datasets optimized for transformer-based architectures.
- Architect and manage vector databases (e.g. Pinecone FAISS Weaviate) for efficient embedding storage and semantic search.
- Optimize data storage retrieval and transformation workflows across multi-cloud and hybrid environments.
- Automate data versioning lineage tracking and governance processes to ensure compliance and reproducibility.
- Build scalable ETL/ELT frameworks and orchestrate workflows using Airflow Prefect or Dagster.
- Contribute to prompt engineering and model evaluation pipelines through metadata enrichment and contextual data provisioning.
- Ensure data quality privacy and ethical use standards across all Generative AI applications.
Qualifications:
- 8 years of professional experience in Data Engineering; 2 years supporting AI/ML or Generative AI workflows.
- Proficiency in Python SQL and distributed data processing frameworks (Spark Pyspark Dask).
- Strong experience with data pipeline orchestration tools (Airflow Luigi Dagster or Prefect).
- Hands-on experience with cloud data ecosystems such as AWS (Glue Redshift S3) Azure Data Factory or GCP BigQuery.
- Knowledge of vector databases and embedding models for RAG-based systems.
- Familiarity with Lang Chain LLMOps and data preparation for fine-tuning LLMs.
- Experience in containerization and orchestration (Docker Kubernetes).
- Working knowledge of API integration data governance and data cataloging tools (e.g. Data Hub Amundsen).
- Exposure to Generative AI concepts such as embeddings tokenization and prompt optimization.
- Understanding of Responsible AI practices data anonymization and bias mitigation techniques.
Best Regards
Tarun K
Phone: 1-
Email:
Job Title: Data Engineer Gen AI Location: Edison NJ Domain: IT Services Duration: Long Term Contract Looking for W2 Candidates. No C2C Responsibilities: Design build and maintain scalable data pipelines to support Generative AI and LLM-based applications. Collect clean and preprocess structure...
Job Title: Data Engineer Gen AI
Location: Edison NJ
Domain: IT Services
Duration: Long Term Contract
Looking for W2 Candidates. No C2C
Responsibilities:
- Design build and maintain scalable data pipelines to support Generative AI and LLM-based applications.
- Collect clean and preprocess structured and unstructured data for model training fine-tuning and retrieval-augmented generation (RAG).
- Implement robust data ingestion frameworks integrating APIs streaming sources and external repositories.
- Collaborate with AI/ML teams to deliver high-quality domain-specific datasets optimized for transformer-based architectures.
- Architect and manage vector databases (e.g. Pinecone FAISS Weaviate) for efficient embedding storage and semantic search.
- Optimize data storage retrieval and transformation workflows across multi-cloud and hybrid environments.
- Automate data versioning lineage tracking and governance processes to ensure compliance and reproducibility.
- Build scalable ETL/ELT frameworks and orchestrate workflows using Airflow Prefect or Dagster.
- Contribute to prompt engineering and model evaluation pipelines through metadata enrichment and contextual data provisioning.
- Ensure data quality privacy and ethical use standards across all Generative AI applications.
Qualifications:
- 8 years of professional experience in Data Engineering; 2 years supporting AI/ML or Generative AI workflows.
- Proficiency in Python SQL and distributed data processing frameworks (Spark Pyspark Dask).
- Strong experience with data pipeline orchestration tools (Airflow Luigi Dagster or Prefect).
- Hands-on experience with cloud data ecosystems such as AWS (Glue Redshift S3) Azure Data Factory or GCP BigQuery.
- Knowledge of vector databases and embedding models for RAG-based systems.
- Familiarity with Lang Chain LLMOps and data preparation for fine-tuning LLMs.
- Experience in containerization and orchestration (Docker Kubernetes).
- Working knowledge of API integration data governance and data cataloging tools (e.g. Data Hub Amundsen).
- Exposure to Generative AI concepts such as embeddings tokenization and prompt optimization.
- Understanding of Responsible AI practices data anonymization and bias mitigation techniques.
Best Regards
Tarun K
Phone: 1-
Email:
View more
View less