Data Scientist

Staffingine LLC

Job Location:

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Title: Data Scientist
Job Location: San Francisco
Job Type: Contract

Job Description:

50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases
25% Building and maintaining CDPs core AI/ML models and frameworks
25% Providing technical support and troubleshooting for AI/ML systems
Youll work in a collaborative environment using cutting-edge technologies including Databricks AWS Collibra DataMesh architecture and PySpark to build scalable production-ready AI systems.
This is a foundational role youll establish our MLOps practices GenAI frameworks and production AI capabilities from the ground up in a highly regulated Federal environment.

What Youll Be Doing

Consulting & Enablement (50%)
- Your number one job will be to help advise economists and business teams on appropriate modeling approaches based on their use cases
- Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases anomaly detection document understanding audit analysis
- Bridge the gap between econometric models (R Stata) and production ML pipelines
- Review and provide feedback on AI/ML architectural proposals
- Train data engineers and business users on AI/ML best practices
Model Development (25%)
- Build production-ready AI systems for document processing (PDFs XLSX DOCX CSV etc.)
- Develop and deploy 1-2 RAG/knowledge base systems in the first year
- Create reusable GenAI frameworks and patterns for the organization
- Implement solutions using AWS AI services (Bedrock SageMaker Textract Databricks etc.)
- Ensure models meet explainability requirements for regulated environments
MLOps & Support (25%)
- Establish MLOps framework and model deployment patterns
- Troubleshoot model performance issues (accuracy latency cost)
- Act as an escalation point for AI/ML technical issues
- Train Users by providing models and documentation as well as consulting
- Monitor and maintain production models
- Stay current on AI/ML techniques and Federal regulatory requirements
- Help other Support Team members advance their knowledge of Data Science and modeling

Minimum Qualifications

Education: Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field

Experience: 4 years in data science ML engineering or AI development roles

Production ML: Proven track record building and deploying ML/AI models in production environments

Programming: Strong Python proficiency; experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)

ML Frameworks: Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)

Generative AI: Practical experience with LLMs RAG architectures and prompt engineering

Document AI: Experience processing and extracting insights from unstructured documents at scale

Cloud Platforms: Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)

Communication: Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions

Tooling: Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)

**Data Access Provisions**

Production ML Experience

1. Can you describe a machine learning model you deployed into production

o What tools did you use (AWS Databricks SageMaker etc.)

o How did you handle monitoring and performance issues

Generative AI / RAG

2. Have you built or implemented a RAG (Retrieval-Augmented Generation) system

o What vector database or retrieval approach did you use

o How did you handle embeddings and chunking

AWS AI Stack

3. What AWS AI/ML services have you used (e.g. SageMaker Bedrock Textract)

4. Have you deployed models in AWS in a regulated or secure environment

Databricks / Spark

5. Have you used Databricks or PySpark in production

o Was it for ETL feature engineering or model training

Document Intelligence

6. Have you worked with unstructured documents (e.g. PDF DOCX etc.) at scale

o What tools or frameworks did you use for extraction

Technical Deep-Dive Questions

Generative AI / LLMs

1. Walk me through how you would design a RAG system from scratch in AWS.

o Embeddings

o Vector store

o Prompt strategy

o Evaluation metrics

2. How do you control hallucinations in LLM-based systems

3. What considerations are important when deploying LLMs in a regulated environment

Databricks / PySpark

4. Explain how you would use PySpark for feature engineering on large datasets (100TB).

5. What are the trade-offs between Spark MLlib and scikit-learn

Production ML & MLOps

6. How have you implemented CI/CD for ML models

7. What metrics do you monitor in production

o Drift

o Latency

o Cost

o Accuracy degradation

8. How do you manage model versioning and rollback

Document AI

9. How would you build a scalable document ingestion pipeline for PDFs and XLSX files

10. Have you used AWS Textract or similar tools How did you validate extraction quality

Consulting & Cross-Functional Work

11. How do you explain complex ML concepts to non-technical stakeholders (e.g. economists)

12. Have you ever translated econometric models (R/Stata) into production

Must-Have Skills:

1) 4 years in data science ML engineering or AI development roles

2) Strong Python proficiency

3) Experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)

4) Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)

5) Practical experience with LLMs RAG architectures and prompt engineering

6) Experience processing and extracting insights from unstructured documents at scale

7) Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)

8) Proven track record building and deploying ML/AI models in production environments

9) Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)

10) Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions

11) Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field

**Soft Skill Requirements**

19. Communication Skills (Rate 1-10) 10

Interview Process: Teams video interview

Is Candidate to Provide Their Own Laptop No

Please make particular note of the following requirements. This MUST be covered with your candidates.

The background screenings shall include but are not limited to the following core inquiries:

National Crime Information Center (NCIC) check

Fieldprint FBI fingerprint check

Sterling 10 year BGC (Criminal Education Employment Resume Comparison)

Social Security Number verification

Office of Foreign Asset and Control (OFAC) Watch List check

Education verification

Employment history report

Drug screening where permitted by applicable law

Peraton Moderate (Personal Investigation Interview will include interview with PI submission of a personal history statement requesting details around citizenship travel employment education residence landlords 7 references financial hardships investments credit check)

Job Title: Data Scientist Job Location: San Francisco Job Type: Contract Job Description: 50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases 25% Building and maintaining CDPs core AI/ML models and frameworks 25% Providing te...

Job Title: Data Scientist
Job Location: San Francisco
Job Type: Contract

Job Description:

50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases
25% Building and maintaining CDPs core AI/ML models and frameworks
25% Providing technical support and troubleshooting for AI/ML systems
Youll work in a collaborative environment using cutting-edge technologies including Databricks AWS Collibra DataMesh architecture and PySpark to build scalable production-ready AI systems.
This is a foundational role youll establish our MLOps practices GenAI frameworks and production AI capabilities from the ground up in a highly regulated Federal environment.

What Youll Be Doing

Consulting & Enablement (50%)
- Your number one job will be to help advise economists and business teams on appropriate modeling approaches based on their use cases
- Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases anomaly detection document understanding audit analysis
- Bridge the gap between econometric models (R Stata) and production ML pipelines
- Review and provide feedback on AI/ML architectural proposals
- Train data engineers and business users on AI/ML best practices
Model Development (25%)
- Build production-ready AI systems for document processing (PDFs XLSX DOCX CSV etc.)
- Develop and deploy 1-2 RAG/knowledge base systems in the first year
- Create reusable GenAI frameworks and patterns for the organization
- Implement solutions using AWS AI services (Bedrock SageMaker Textract Databricks etc.)
- Ensure models meet explainability requirements for regulated environments
MLOps & Support (25%)
- Establish MLOps framework and model deployment patterns
- Troubleshoot model performance issues (accuracy latency cost)
- Act as an escalation point for AI/ML technical issues
- Train Users by providing models and documentation as well as consulting
- Monitor and maintain production models
- Stay current on AI/ML techniques and Federal regulatory requirements
- Help other Support Team members advance their knowledge of Data Science and modeling