- Speech Model Optimization & Applied Research:
Tune and optimize ASR and TTS models for real-world call center environments improving transcription accuracy noise robustness and speaker variability
Improve spoken output naturalness by refining prosody pacing number and spelling pronunciation and conversational flow
Balance latency vs. quality tradeoffs in streaming speech pipelines to maintain real-time responsiveness
Evaluate and integrate emerging speech technologies (e.g. noise suppression voice activity detection diarization) to measurably improve performance
- Voice Infrastructure & Systems Engineering
Architect and modernize a scalable high-availability voice infrastructure that replaces legacy systems
Build multi-threaded low-latency server frameworks capable of handling thousands of concurrent real-time audio streams
Design and operate streaming ASR LLM TTS pipelines that power live AI-driven customer conversations
Develop robust media stream handling to ensure reliable audio flow between telephony providers clients and ML services
- Evaluation Observability & Quality
Define and implement speech quality evaluation frameworks including WER/CER analysis latency tracking and perceptual TTS metrics
Build tooling and dashboards to monitor production performance and detect regressions in accuracy latency or naturalness
Create load-testing and simulation tools to model high-concurrency real-world voice traffic
- Cross-Functional Collaboration
Partner with Speech Scientists and ML Researchers to productionize new ASR and TTS models
Work with Security and Compliance teams to ensure voice data handling meets enterprise and regulatory standards
Collaborate with Product teams to translate conversational quality requirements into measurable system improvements