Job Title: AI/ML Engineer
Location: Frisco TX/ Atlanta GA/ Bellevue WA (Onsite from Day 1) (Need Only Locals)
Looking for W2 No C2C
Job Description:
- We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.
- This role focuses on developing machine learning pipelines that deduplicate link and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data making the platform accessible to business users across the organization.
- You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved every profile is trustworthy and every user can access the data they need.
Job Responsibilities:
- Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
- Implement probabilistic matching techniques (e.g. Fellegi-Sunter) and ML models (gradient boosting neural classifiers) for record linkage across the US adult population
- Build candidate blocking pipelines using phonetic algorithms (Soundex Double Metaphone) token similarity and LSH to handle billions of potential match pairs efficiently
- Apply fuzzy matching techniques (Levenshtein Jaro-Winkler Jaccard) for customer attributes such as name address phone and identifiers
- Develop clustering algorithms (DBSCAN hierarchical clustering) to create unified golden customer profiles that serve as the authoritative representation of each individual
- Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
- Implement ANN/KNN retrieval systems (FAISS Annoy) for large-scale entity matching across population-scale datasets
Job Responsibilities - AI/LLM:
- Use LLMs (e.g. GPT Claude) for classification and disambiguation of entity matches improving resolution accuracy where traditional methods fall short
- Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
- Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
- Contribute to NLQ-to-SQL systems enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
- Support integration with vector databases (e.g. Pinecone PGVector Qdrant) for semantic search across customer data
Education and Work Experience:
- Bachelors or masters degree in computer science Data Science or related field
- 3 years of experience in ML/AI engineering
- At least 1 year of experience in entity resolution record linkage or deduplication - ideally at scale
Technical Skills:
- Programming: Python (required)
- Libraries: scikit-learn HuggingFace Transformers RapidFuzz jellyfish
- Experience with LLM APIs (OpenAI Anthropic) and prompt pipelines
- Strong SQL skills and experience with Spark or Dask for distributed processing
- Familiarity with vector databases and embedding-based retrieval
- Experience with ML lifecycle tools (MLflow or similar)
- Understanding of data quality metrics and how identity resolution impacts downstream trust
Knowledge Skills and Abilities:
- Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
- Ability to work with large messy real-world datasets spanning hundreds of millions of records
- Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
- Good problem-solving and analytical skills
- Ability to collaborate with data engineering platform and business teams to deliver accurate customer profiles
Best Regards:
Tanuja P
Phone: 1-
Email:
Job Title: AI/ML Engineer Location: Frisco TX/ Atlanta GA/ Bellevue WA (Onsite from Day 1) (Need Only Locals) Looking for W2 No C2C Job Description: We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Plat...
Job Title: AI/ML Engineer
Location: Frisco TX/ Atlanta GA/ Bellevue WA (Onsite from Day 1) (Need Only Locals)
Looking for W2 No C2C
Job Description:
- We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.
- This role focuses on developing machine learning pipelines that deduplicate link and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data making the platform accessible to business users across the organization.
- You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved every profile is trustworthy and every user can access the data they need.
Job Responsibilities:
- Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
- Implement probabilistic matching techniques (e.g. Fellegi-Sunter) and ML models (gradient boosting neural classifiers) for record linkage across the US adult population
- Build candidate blocking pipelines using phonetic algorithms (Soundex Double Metaphone) token similarity and LSH to handle billions of potential match pairs efficiently
- Apply fuzzy matching techniques (Levenshtein Jaro-Winkler Jaccard) for customer attributes such as name address phone and identifiers
- Develop clustering algorithms (DBSCAN hierarchical clustering) to create unified golden customer profiles that serve as the authoritative representation of each individual
- Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
- Implement ANN/KNN retrieval systems (FAISS Annoy) for large-scale entity matching across population-scale datasets
Job Responsibilities - AI/LLM:
- Use LLMs (e.g. GPT Claude) for classification and disambiguation of entity matches improving resolution accuracy where traditional methods fall short
- Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
- Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
- Contribute to NLQ-to-SQL systems enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
- Support integration with vector databases (e.g. Pinecone PGVector Qdrant) for semantic search across customer data
Education and Work Experience:
- Bachelors or masters degree in computer science Data Science or related field
- 3 years of experience in ML/AI engineering
- At least 1 year of experience in entity resolution record linkage or deduplication - ideally at scale
Technical Skills:
- Programming: Python (required)
- Libraries: scikit-learn HuggingFace Transformers RapidFuzz jellyfish
- Experience with LLM APIs (OpenAI Anthropic) and prompt pipelines
- Strong SQL skills and experience with Spark or Dask for distributed processing
- Familiarity with vector databases and embedding-based retrieval
- Experience with ML lifecycle tools (MLflow or similar)
- Understanding of data quality metrics and how identity resolution impacts downstream trust
Knowledge Skills and Abilities:
- Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
- Ability to work with large messy real-world datasets spanning hundreds of millions of records
- Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
- Good problem-solving and analytical skills
- Ability to collaborate with data engineering platform and business teams to deliver accurate customer profiles
Best Regards:
Tanuja P
Phone: 1-
Email:
View more
View less