Speech Software Engineer

Mountain View, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

At ASAPP our mission is simple: deliver the best AI-powered customer experiencefaster than anyone else. We are guided by principles that shape how we think build and execute including deep customer obsession purposeful speed ownership and a relentless focus on outcomes. We work in small highly skilled teams prioritize clarity over complexity and continuously evolve through curiosity data and craftsmanship.

Were building a globally diverse team of technologists and problem solvers who thrive in fast-paced environments value collaboration and approach every challenge with a Day 1 mindset. With hubs in New York City Mountain View Latin America and India. If youre driven by continuous learning rapid iteration and the challenge of building in a high-growth startup this is more than a roleits a journey.

We are seeking a Senior Speech Software Engineer to drive both the infrastructure and applied speech intelligence behind our real-time voice AI platform. This is not just a systems role you will operate at the intersection of speech research model optimization and production engineering ensuring our ASR and TTS systems meet the demanding quality latency and reliability requirements of enterprise call centers.

You will help evolve our speech stack to deliver human-like low-latency voice interactions at massive scale tuning and adapting modern speech models to perform in noisy real-world customer environments. You will work closely with Speech Scientists ML Researchers and Infrastructure Engineers to bridge cutting-edge speech technology with hardened production systems.

What youll do

Speech Model Optimization & Applied Research:

Tune and optimize ASR and TTS models for real-world call center environments improving transcription accuracy noise robustness and speaker variability

Improve spoken output naturalness by refining prosody pacing number and spelling pronunciation and conversational flow

Balance latency vs. quality tradeoffs in streaming speech pipelines to maintain real-time responsiveness

Evaluate and integrate emerging speech technologies (e.g. noise suppression voice activity detection diarization) to measurably improve performance

Voice Infrastructure & Systems Engineering

Architect and modernize a scalable high-availability voice infrastructure that replaces legacy systems

Build multi-threaded low-latency server frameworks capable of handling thousands of concurrent real-time audio streams

Design and operate streaming ASR LLM TTS pipelines that power live AI-driven customer conversations

Develop robust media stream handling to ensure reliable audio flow between telephony providers clients and ML services

Evaluation Observability & Quality

Define and implement speech quality evaluation frameworks including WER/CER analysis latency tracking and perceptual TTS metrics

Build tooling and dashboards to monitor production performance and detect regressions in accuracy latency or naturalness

Create load-testing and simulation tools to model high-concurrency real-world voice traffic

Cross-Functional Collaboration

Partner with Speech Scientists and ML Researchers to productionize new ASR and TTS models

Work with Security and Compliance teams to ensure voice data handling meets enterprise and regulatory standards

Collaborate with Product teams to translate conversational quality requirements into measurable system improvements

What youll need

Core Engineering Background

5 years of software engineering experience building and operating production-grade distributed systems

Strong proficiency in Golang or Python (or willingness to become an expert quickly)

Experience designing low-latency high-concurrency systems ideally involving real-time media or streaming data

Speech & Audio Expertise

Practical experience working with ASR and/or TTS systems in applied or production environments

Understanding of how to adapt and tune speech models for domain-specific use cases

Familiarity with speech quality metrics such as WER CER MOS latency and streaming stability

Strong grasp of audio fundamentals including sample rates codecs (Opus G.711) buffering packet loss and jitter

Applied ML for Speech

Experience evaluating model performance and running structured experiments to improve transcription accuracy and speech naturalness

Comfort working with modern ML tooling and model APIs to fine-tune adapt or post-process speech model outputs

Ability to make pragmatic tradeoffs between model quality compute cost and real-time constraints

What wed like to see

Experience with noise reduction echo cancellation VAD diarization or other speech enhancement technologies
Familiarity with forced alignment techniques or phoneme/word-level timing models
Hands-on experience deploying ML services with Kubernetes Docker and cloud platforms (AWS/GCP/Azure)
Knowledge of event-driven and asynchronous systems (e.g. async I/O event loops streaming frameworks)
Experience analyzing large-scale speech or conversation datasets to drive model or system improvements

$215000 - $235000 a year

The compensation includes salary plus performance bonus. The actual salary may be different depending upon non-discriminatory factors such as qualifications experience and other factors permitted by law.

ASAPP is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race color religion gender gender identity or expression sexual orientation national origin disability age or veteran status. If you have a disability and need assistance with our employment application process please email us at emailprotected to obtain assistance. #LI-AG1 #LI-Hybrid

We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.

Required Experience:

What youll do

Speech Model Optimization & Applied Research:

Tune and optimize ASR and TTS models for real-world call center environments improving transcription accuracy noise robustness and speaker variability

Improve spoken output naturalness by refining prosody pacing number and spelling pronunciation and conversational flow

Balance latency vs. quality tradeoffs in streaming speech pipelines to maintain real-time responsiveness

Evaluate and integrate emerging speech technologies (e.g. noise suppression voice activity detection diarization) to measurably improve performance

Voice Infrastructure & Systems Engineering

Architect and modernize a scalable high-availability voice infrastructure that replaces legacy systems

Build multi-threaded low-latency server frameworks capable of handling thousands of concurrent real-time audio streams

Design and operate streaming ASR LLM TTS pipelines that power live AI-driven customer conversations

Develop robust media stream handling to ensure reliable audio flow between telephony providers clients and ML services

Evaluation Observability & Quality

Define and implement speech quality evaluation frameworks including WER/CER analysis latency tracking and perceptual TTS metrics

Build tooling and dashboards to monitor production performance and detect regressions in accuracy latency or naturalness

Create load-testing and simulation tools to model high-concurrency real-world voice traffic

Cross-Functional Collaboration

Partner with Speech Scientists and ML Researchers to productionize new ASR and TTS models

Work with Security and Compliance teams to ensure voice data handling meets enterprise and regulatory standards

Collaborate with Product teams to translate conversational quality requirements into measurable system improvements

What youll need

Core Engineering Background

5 years of software engineering experience building and operating production-grade distributed systems

Strong proficiency in Golang or Python (or willingness to become an expert quickly)

Experience designing low-latency high-concurrency systems ideally involving real-time media or streaming data

Speech & Audio Expertise

Practical experience working with ASR and/or TTS systems in applied or production environments

Understanding of how to adapt and tune speech models for domain-specific use cases

Familiarity with speech quality metrics such as WER CER MOS latency and streaming stability

Strong grasp of audio fundamentals including sample rates codecs (Opus G.711) buffering packet loss and jitter

Applied ML for Speech

Experience evaluating model performance and running structured experiments to improve transcription accuracy and speech naturalness

Comfort working with modern ML tooling and model APIs to fine-tune adapt or post-process speech model outputs

Ability to make pragmatic tradeoffs between model quality compute cost and real-time constraints

What wed like to see

Experience with noise reduction echo cancellation VAD diarization or other speech enhancement technologies
Familiarity with forced alignment techniques or phoneme/word-level timing models
Hands-on experience deploying ML services with Kubernetes Docker and cloud platforms (AWS/GCP/Azure)
Knowledge of event-driven and asynchronous systems (e.g. async I/O event loops streaming frameworks)
Experience analyzing large-scale speech or conversation datasets to drive model or system improvements

$215000 - $235000 a year

Required Experience:

Key Skills

Experience Working With Students
Time Management
Acute Care
HTTPS
Speech Therapy
Assistive Technologies
Autism Experience
Pediatrics Experience
Home Care
Interpretation
Patient Service Experience
Phone Etiquette

Apply Now

About Company

Asapp

Improve customer experience and radically increase CX performance at the same time. This AI-Native® software platform provides AI-driven predictions on what agents should and do throughout each interaction and increasingly automates routine tasks before, during, and after the conversa ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Speech Software Engineer

Mountain View, CA - USA

Job Summary

What youll do

What youll need

What wed like to see

What youll do

What youll need

What wed like to see

Key Skills

About Company

Related Jobs