Data Scientist
Rockville, MD - USA
Job Summary
Greetings from Conch Technologies Inc
Key Responsibilities:
Conversation Analytics & Insight Generation
Design and implement analytics pipelines that extract meaningful patterns from large-scale chat conversation data
Develop facet extraction approaches using LLMs to categorize conversations by request type task performed and topic discussed
Build dashboards and reporting artifacts that communicate usage trends emerging topics and user behavior to stakeholders
Identify and quantify shifts in conversation patterns over time to inform product roadmap and content strategy
Translate analytical findings into actionable recommendations for platform improvement
Clustering & Unsupervised Learning
Architect and optimize hierarchical clustering pipelines using density-based algorithms (e.g. HDBSCAN) to group conversations by semantic similarity
Generate and manage text embeddings at scale using embedding models for downstream clustering and similarity tasks
Design multi-level clustering strategies that produce both granular groupings and higher-order category taxonomies
Evaluate cluster quality using persistence metrics silhouette analysis and domain-informed validation
Experiment with clustering parameters distance metrics and dimensionality reduction techniques to improve grouping coherence
Data Engineering & Pipeline Development
Build and maintain data pipelines using Python for ingesting transforming and analyzing conversation datasets
Develop automated workflows using cloud-native orchestration and compute services to run analytics at scale on scheduled cadences
Work with object storage search engines and relational databases to store and query analytical outputs
Implement caching batching and incremental processing strategies to handle large embedding and clustering workloads efficiently
Maintain reproducible analysis environments and version analytical artifacts (models cluster outputs embeddings)
LLM-Assisted Analysis
Design and refine LLM prompts for facet extraction cluster labeling and conversation summarization
Evaluate LLM output quality for analytical tasks and iterate on prompt strategies to improve accuracy
Leverage model infrastructure for embedding generation and LLM inference
Explore emerging techniques in LLM-driven data analysis topic modeling and automated insight generation
Quality & Testing
Develop evaluation frameworks to measure clustering quality facet extraction accuracy and analytical pipeline correctness
Build automated regression tests to detect drift in clustering outputs or degradation in categorization quality
Validate analytical results against known baselines and domain expertise
Document methodologies assumptions and limitations of analytical approaches
Security & Compliance
Assist with adherence to technology policies and comply with all security controls
Implement secure coding practices particularly in handling personally identifiable information (PII) and sensitive data
Participate in threat modeling and security discussions for API and infrastructure components
Understand and apply organizational security standards and best practices
Chiranjeevi Kosetty
Desk:
Email: