Job Title: Data Scientist
Location: Atlanta GA (Remote)
Job Type: Long-Term Contract
About the Role:
We are seeking a highly motivated and skilled Data Scientist with strong expertise in data science fundamentals machine learning (ML) and large language models (LLMs). The ideal candidate will have hands-on experience working with Databricks and Azure ecosystems including PySpark for data processing and LLM tuning within Databricks. This role involves building and optimizing data science solutions that leverage cloud-based technologies to deliver business value.
Key Responsibilities:
- Design develop and deploy data science and ML solutions on Databricks (Azure environment).
- Work on end-to-end ML lifecycle from data preparation and feature engineering to model training evaluation and deployment.
- Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
- Utilize PySpark for distributed data processing cleaning and transformation.
- Collaborate with data engineers cloud architects and business stakeholders to ensure seamless integration of ML models into production workflows.
- Conduct exploratory data analysis (EDA) statistical modeling and hypothesis testing to extract insights from structured and unstructured data.
- Stay updated on the latest advancements in AI/ML LLMs and Databricks capabilities to bring innovative solutions.
- Document methodologies experiments and best practices for knowledge sharing.
Required Skills & Qualifications:
- Bachelors/Masters degree in Computer Science Data Science Statistics AI/ML or related field.
- Proven experience as a Data Scientist with exposure to ML and NLP projects.
- Strong hands-on experience with Databricks on Azure (MLflow Delta Lake Databricks ML).
- Proficiency in PySpark for large-scale data processing.
- Experience in training fine-tuning and deploying LLMs within Databricks environment.
- Strong programming skills in Python and familiarity with ML frameworks (TensorFlow PyTorch Scikit-learn Hugging Face).
- Solid understanding of data science workflows: data wrangling feature engineering model development and evaluation.
- Working knowledge of Azure cloud services (Azure Data Lake Azure Synapse Azure ML).
- Strong problem-solving analytical thinking and communication skills.
Good-to-Have Skills:
- Experience with MLOps practices and tools (CI/CD for ML MLflow).
- Knowledge of vector databases and LLM deployment pipelines.
- Familiarity with prompt engineering and RAG (Retrieval-Augmented Generation) techniques.
- Exposure to generative AI projects on cloud platforms.