Job Title: Data Scientist
Job Location: San Francisco
Job Type: Contract
Job Description:
- 50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases
- 25% Building and maintaining CDPs core AI/ML models and frameworks
- 25% Providing technical support and troubleshooting for AI/ML systems
- Youll work in a collaborative environment using cutting-edge technologies including Databricks AWS Collibra DataMesh architecture and PySpark to build scalable production-ready AI systems.
- This is a foundational role youll establish our MLOps practices GenAI frameworks and production AI capabilities from the ground up in a highly regulated Federal environment.
What Youll Be Doing
- Consulting & Enablement (50%)
- Your number one job will be to help advise economists and business teams on appropriate modeling approaches based on their use cases
- Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases anomaly detection document understanding audit analysis
- Bridge the gap between econometric models (R Stata) and production ML pipelines
- Review and provide feedback on AI/ML architectural proposals
- Train data engineers and business users on AI/ML best practices
- Model Development (25%)
- Build production-ready AI systems for document processing (PDFs XLSX DOCX CSV etc.)
- Develop and deploy 1-2 RAG/knowledge base systems in the first year
- Create reusable GenAI frameworks and patterns for the organization
- Implement solutions using AWS AI services (Bedrock SageMaker Textract Databricks etc.)
- Ensure models meet explainability requirements for regulated environments
- MLOps & Support (25%)
- Establish MLOps framework and model deployment patterns
- Troubleshoot model performance issues (accuracy latency cost)
- Act as an escalation point for AI/ML technical issues
- Train Users by providing models and documentation as well as consulting
- Monitor and maintain production models
- Stay current on AI/ML techniques and Federal regulatory requirements
- Help other Support Team members advance their knowledge of Data Science and modeling
Minimum Qualifications
Education: Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field
Experience: 4 years in data science ML engineering or AI development roles
Production ML: Proven track record building and deploying ML/AI models in production environments
Programming: Strong Python proficiency; experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)
ML Frameworks: Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)
Generative AI: Practical experience with LLMs RAG architectures and prompt engineering
Document AI: Experience processing and extracting insights from unstructured documents at scale
Cloud Platforms: Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)
Communication: Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions
Tooling: Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)
**Data Access Provisions**
Production ML Experience
1. Can you describe a machine learning model you deployed into production
o What tools did you use (AWS Databricks SageMaker etc.)
o How did you handle monitoring and performance issues
Generative AI / RAG
2. Have you built or implemented a RAG (Retrieval-Augmented Generation) system
o What vector database or retrieval approach did you use
o How did you handle embeddings and chunking
AWS AI Stack
3. What AWS AI/ML services have you used (e.g. SageMaker Bedrock Textract)
4. Have you deployed models in AWS in a regulated or secure environment
Databricks / Spark
5. Have you used Databricks or PySpark in production
o Was it for ETL feature engineering or model training
Document Intelligence
6. Have you worked with unstructured documents (e.g. PDF DOCX etc.) at scale
o What tools or frameworks did you use for extraction
Technical Deep-Dive Questions
Generative AI / LLMs
1. Walk me through how you would design a RAG system from scratch in AWS.
o Embeddings
o Vector store
o Prompt strategy
o Evaluation metrics
2. How do you control hallucinations in LLM-based systems
3. What considerations are important when deploying LLMs in a regulated environment
Databricks / PySpark
4. Explain how you would use PySpark for feature engineering on large datasets (100TB).
5. What are the trade-offs between Spark MLlib and scikit-learn
Production ML & MLOps
6. How have you implemented CI/CD for ML models
7. What metrics do you monitor in production
o Drift
o Latency
o Cost
o Accuracy degradation
8. How do you manage model versioning and rollback
Document AI
9. How would you build a scalable document ingestion pipeline for PDFs and XLSX files
10. Have you used AWS Textract or similar tools How did you validate extraction quality
Consulting & Cross-Functional Work
11. How do you explain complex ML concepts to non-technical stakeholders (e.g. economists)
12. Have you ever translated econometric models (R/Stata) into production
Must-Have Skills:
1) 4 years in data science ML engineering or AI development roles
2) Strong Python proficiency
3) Experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)
4) Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)
5) Practical experience with LLMs RAG architectures and prompt engineering
6) Experience processing and extracting insights from unstructured documents at scale
7) Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)
8) Proven track record building and deploying ML/AI models in production environments
9) Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)
10) Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions
11) Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field
**Soft Skill Requirements**
19. Communication Skills (Rate 1-10) 10
Interview Process: Teams video interview
Is Candidate to Provide Their Own Laptop No
Please make particular note of the following requirements. This MUST be covered with your candidates.
The background screenings shall include but are not limited to the following core inquiries:
National Crime Information Center (NCIC) check
Fieldprint FBI fingerprint check
Sterling 10 year BGC (Criminal Education Employment Resume Comparison)
Social Security Number verification
Office of Foreign Asset and Control (OFAC) Watch List check
Education verification
Employment history report
Drug screening where permitted by applicable law
Peraton Moderate (Personal Investigation Interview will include interview with PI submission of a personal history statement requesting details around citizenship travel employment education residence landlords 7 references financial hardships investments credit check)
Job Title: Data Scientist Job Location: San Francisco Job Type: Contract Job Description: 50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases 25% Building and maintaining CDPs core AI/ML models and frameworks 25% Providing te...
Job Title: Data Scientist
Job Location: San Francisco
Job Type: Contract
Job Description:
- 50% Consulting with internal teams (economists analysts) to design and implement AI solutions for their use cases
- 25% Building and maintaining CDPs core AI/ML models and frameworks
- 25% Providing technical support and troubleshooting for AI/ML systems
- Youll work in a collaborative environment using cutting-edge technologies including Databricks AWS Collibra DataMesh architecture and PySpark to build scalable production-ready AI systems.
- This is a foundational role youll establish our MLOps practices GenAI frameworks and production AI capabilities from the ground up in a highly regulated Federal environment.
What Youll Be Doing
- Consulting & Enablement (50%)
- Your number one job will be to help advise economists and business teams on appropriate modeling approaches based on their use cases
- Advise on appropriate modeling approaches for diverse scenarios: RAG/knowledge bases anomaly detection document understanding audit analysis
- Bridge the gap between econometric models (R Stata) and production ML pipelines
- Review and provide feedback on AI/ML architectural proposals
- Train data engineers and business users on AI/ML best practices
- Model Development (25%)
- Build production-ready AI systems for document processing (PDFs XLSX DOCX CSV etc.)
- Develop and deploy 1-2 RAG/knowledge base systems in the first year
- Create reusable GenAI frameworks and patterns for the organization
- Implement solutions using AWS AI services (Bedrock SageMaker Textract Databricks etc.)
- Ensure models meet explainability requirements for regulated environments
- MLOps & Support (25%)
- Establish MLOps framework and model deployment patterns
- Troubleshoot model performance issues (accuracy latency cost)
- Act as an escalation point for AI/ML technical issues
- Train Users by providing models and documentation as well as consulting
- Monitor and maintain production models
- Stay current on AI/ML techniques and Federal regulatory requirements
- Help other Support Team members advance their knowledge of Data Science and modeling
Minimum Qualifications
Education: Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field
Experience: 4 years in data science ML engineering or AI development roles
Production ML: Proven track record building and deploying ML/AI models in production environments
Programming: Strong Python proficiency; experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)
ML Frameworks: Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)
Generative AI: Practical experience with LLMs RAG architectures and prompt engineering
Document AI: Experience processing and extracting insights from unstructured documents at scale
Cloud Platforms: Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)
Communication: Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions
Tooling: Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)
**Data Access Provisions**
Production ML Experience
1. Can you describe a machine learning model you deployed into production
o What tools did you use (AWS Databricks SageMaker etc.)
o How did you handle monitoring and performance issues
Generative AI / RAG
2. Have you built or implemented a RAG (Retrieval-Augmented Generation) system
o What vector database or retrieval approach did you use
o How did you handle embeddings and chunking
AWS AI Stack
3. What AWS AI/ML services have you used (e.g. SageMaker Bedrock Textract)
4. Have you deployed models in AWS in a regulated or secure environment
Databricks / Spark
5. Have you used Databricks or PySpark in production
o Was it for ETL feature engineering or model training
Document Intelligence
6. Have you worked with unstructured documents (e.g. PDF DOCX etc.) at scale
o What tools or frameworks did you use for extraction
Technical Deep-Dive Questions
Generative AI / LLMs
1. Walk me through how you would design a RAG system from scratch in AWS.
o Embeddings
o Vector store
o Prompt strategy
o Evaluation metrics
2. How do you control hallucinations in LLM-based systems
3. What considerations are important when deploying LLMs in a regulated environment
Databricks / PySpark
4. Explain how you would use PySpark for feature engineering on large datasets (100TB).
5. What are the trade-offs between Spark MLlib and scikit-learn
Production ML & MLOps
6. How have you implemented CI/CD for ML models
7. What metrics do you monitor in production
o Drift
o Latency
o Cost
o Accuracy degradation
8. How do you manage model versioning and rollback
Document AI
9. How would you build a scalable document ingestion pipeline for PDFs and XLSX files
10. Have you used AWS Textract or similar tools How did you validate extraction quality
Consulting & Cross-Functional Work
11. How do you explain complex ML concepts to non-technical stakeholders (e.g. economists)
12. Have you ever translated econometric models (R/Stata) into production
Must-Have Skills:
1) 4 years in data science ML engineering or AI development roles
2) Strong Python proficiency
3) Experience with SQL and at least one statistical language (R Stata Matlab Sparkly R)
4) Hands-on experience with modern ML frameworks (scikit-learn TensorFlow PyTorch Hugging Face)
5) Practical experience with LLMs RAG architectures and prompt engineering
6) Experience processing and extracting insights from unstructured documents at scale
7) Experience working with our tech stack (Databricks AWS AI/ML tools Starburst preferred)
8) Proven track record building and deploying ML/AI models in production environments
9) Working knowledge of AWS AI/ML services (SageMaker Bedrock preferred)
10) Ability to explain complex AI concepts to non-technical stakeholders and translate business problems into technical solutions
11) Masters degree in Data Science Statistics Computer Science Mathematics or related quantitative field
**Soft Skill Requirements**
19. Communication Skills (Rate 1-10) 10
Interview Process: Teams video interview
Is Candidate to Provide Their Own Laptop No
Please make particular note of the following requirements. This MUST be covered with your candidates.
The background screenings shall include but are not limited to the following core inquiries:
National Crime Information Center (NCIC) check
Fieldprint FBI fingerprint check
Sterling 10 year BGC (Criminal Education Employment Resume Comparison)
Social Security Number verification
Office of Foreign Asset and Control (OFAC) Watch List check
Education verification
Employment history report
Drug screening where permitted by applicable law
Peraton Moderate (Personal Investigation Interview will include interview with PI submission of a personal history statement requesting details around citizenship travel employment education residence landlords 7 references financial hardships investments credit check)
View more
View less