GenAI Data Engineer with Azure Databricks

Atlanta, GA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

We are seeking a skilled GenAI Data Engineer with expertise in Databricks to design build and optimize data pipelines that power advanced analytics machine learning and Generative AI solutions. The ideal candidate will have strong data engineering experience proficiency with Databricks and hands-on exposure to large-scale AI/ML systems.

Key Responsibilities

Design develop and maintain data pipelines and workflows using Databricks Spark and Delta Lake.
Work with structured and unstructured data to build training datasets for Generative AI models.
Optimize data ingestion transformation and storage for large-scale AI/ML workloads.
Collaborate with data scientists ML engineers and AI researchers to enable model training and deployment.
Implement MLOps practices for scaling AI/GenAI models in production.
Ensure data quality governance and compliance across data pipelines.
Integrate LLMs (Large Language Models) with enterprise datasets for business use cases.
Monitor and fine-tune Databricks clusters jobs and performance metrics.
Contribute to cloud-native architectures (Azure/AWS/GCP) supporting AI/ML workloads.

Required Qualifications

Bachelors or masters degree in computer science Data Engineering or related field.
5 years of experience in data engineering and ETL pipeline development.
Hands-on expertise in Databricks Apache Spark Delta Lake and MLflow.
Strong proficiency in Python SQL and PySpark.
Experience with Generative AI frameworks (Hugging Face LangChain OpenAI APIs or similar).
Knowledge of cloud platforms (Azure Data Lake AWS S3 GCP BigQuery).
Experience with MLOps/AI Ops pipelines.
Familiarity with vector databases (e.g. Pinecone Weaviate FAISS) is a plus.

Preferred Skills

Experience deploying LLMs on Databricks.
Knowledge of data governance and security best practices.
Familiarity with REST APIs microservices and containerized deployments (Docker/Kubernetes).
Strong problem-solving and collaboration skills in cross-functional AI/ML teams.

We are seeking a skilled GenAI Data Engineer with expertise in Databricks to design build and optimize data pipelines that power advanced analytics machine learning and Generative AI solutions. The ideal candidate will have strong data engineering experience proficiency with Databricks and hands-o...

Key Responsibilities

Design develop and maintain data pipelines and workflows using Databricks Spark and Delta Lake.
Work with structured and unstructured data to build training datasets for Generative AI models.
Optimize data ingestion transformation and storage for large-scale AI/ML workloads.
Collaborate with data scientists ML engineers and AI researchers to enable model training and deployment.
Implement MLOps practices for scaling AI/GenAI models in production.
Ensure data quality governance and compliance across data pipelines.
Integrate LLMs (Large Language Models) with enterprise datasets for business use cases.
Monitor and fine-tune Databricks clusters jobs and performance metrics.
Contribute to cloud-native architectures (Azure/AWS/GCP) supporting AI/ML workloads.

Required Qualifications

Bachelors or masters degree in computer science Data Engineering or related field.
5 years of experience in data engineering and ETL pipeline development.
Hands-on expertise in Databricks Apache Spark Delta Lake and MLflow.
Strong proficiency in Python SQL and PySpark.
Experience with Generative AI frameworks (Hugging Face LangChain OpenAI APIs or similar).
Knowledge of cloud platforms (Azure Data Lake AWS S3 GCP BigQuery).
Experience with MLOps/AI Ops pipelines.
Familiarity with vector databases (e.g. Pinecone Weaviate FAISS) is a plus.

Preferred Skills

Experience deploying LLMs on Databricks.
Knowledge of data governance and security best practices.
Familiarity with REST APIs microservices and containerized deployments (Docker/Kubernetes).
Strong problem-solving and collaboration skills in cross-functional AI/ML teams.