Research Engineer, Audio Dialogue

Mountain View, CA - USA

Monthly Salary: $ 197000 - 291000

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Snapshot

Artificial Intelligence could be one of humanitys most useful inventions. At Google DeepMind were a team of scientists engineers machine learning experts and more working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery and collaborate with others on critical challenges ensuring safety and ethics are the highest priority.

Who are we

Real-Time Dialog team in GDM Audio

Our mission

Mission: We are building the next generation of conversational capabilities powered by the Gemini LLM. Our mission is to empower multimodal conversational agents with groundbreaking speech and audio capabilities. By starting with LLMs that natively understand rich audio input we aim to create agents that can orchestrate all aspects of a dialog. This includes knowing when to listen and wait when to interrupt reading the emotive style and coordinating complex multimodal interactions that span audio video and text. As part of the gemini audio team our mission is to create scale and productionize novel capabilities into the core Gemini Model impacting many product areas across Google.

Team Objectives:

Create real-time dialog capabilities for Gemini agents that seamlessly span audio video and text modalities.
Pioneer end-to-end ML/Gemini architectures that streamline the dialog process minimizing the need for complex model cascades.
Direct impact on Gemini core model development - setting the direction for real-time dialog capabilities covering pre-training and post-training.
Collaborate on deployments in Gemini Live (GL) Cloud XR (Glasses) Astra and other product areas pushing the boundaries of multimodal interaction research.
Scale core modeling advancements to meet product requirements / model families including latency safety and factuality.

Job responsibilities

As a Senior Research Engineer you will:

Lead efforts to produce more natural and capable real-time dialog agents
Collaborate closely with other teams on areas like reasoning function calling and multi-agent frameworks
Partner with product teams design develop and deploy novel multimodal conversational agents.
Create new data pipelines covering both real and synthetic sources influencing pre-training and post-training (SFT & RL)

About You

In order to set you up for success as a Research Engineer at Google DeepMind we look for the following skills and experience:

Bachelor degree in Computer Science a related field or equivalent practical experience.
Significant industry experience building and deploying Speech / ML models.
Demonstrated experience in data preparation training and evaluation of ML models.
5 years of experience with software development in Python esp. ML frameworks like Tensorflow JAX PyTorch etc.

In addition the following would be an advantage:

Ph.D. in Computer Science or a related field.
Experience with multimodal foundation models.
Experience with real-time multimodal dialog systems.
Hands-on experience with the Gemini models (data processing training SFT RL serving).
Research background in NLP / Generative AI
Experience with C

The US base salary range for this full-time position is between $197000 - $291000 bonus equity benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

At Google DeepMind we value diversity of experience knowledge backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex race religion or belief ethnic or national origin disability age citizenship marital domestic or civil partnership status sexual orientation gender identity pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation please do not hesitate to let us know.

SnapshotArtificial Intelligence could be one of humanitys most useful inventions. At Google DeepMind were a team of scientists engineers machine learning experts and more working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefi...

Snapshot

Who are we

Real-Time Dialog team in GDM Audio

Our mission

Team Objectives:

Create real-time dialog capabilities for Gemini agents that seamlessly span audio video and text modalities.
Pioneer end-to-end ML/Gemini architectures that streamline the dialog process minimizing the need for complex model cascades.
Direct impact on Gemini core model development - setting the direction for real-time dialog capabilities covering pre-training and post-training.
Collaborate on deployments in Gemini Live (GL) Cloud XR (Glasses) Astra and other product areas pushing the boundaries of multimodal interaction research.
Scale core modeling advancements to meet product requirements / model families including latency safety and factuality.

Job responsibilities

As a Senior Research Engineer you will:

Lead efforts to produce more natural and capable real-time dialog agents
Collaborate closely with other teams on areas like reasoning function calling and multi-agent frameworks
Partner with product teams design develop and deploy novel multimodal conversational agents.
Create new data pipelines covering both real and synthetic sources influencing pre-training and post-training (SFT & RL)

About You

In order to set you up for success as a Research Engineer at Google DeepMind we look for the following skills and experience:

Bachelor degree in Computer Science a related field or equivalent practical experience.
Significant industry experience building and deploying Speech / ML models.
Demonstrated experience in data preparation training and evaluation of ML models.
5 years of experience with software development in Python esp. ML frameworks like Tensorflow JAX PyTorch etc.

In addition the following would be an advantage:

Ph.D. in Computer Science or a related field.
Experience with multimodal foundation models.
Experience with real-time multimodal dialog systems.
Hands-on experience with the Gemini models (data processing training SFT RL serving).
Research background in NLP / Generative AI
Experience with C

Key Skills

Computer Science
E & I
Debugging
C/C++
Objective C
Swift
OS Kernels
Signal Processing
Matlab
Unreal Engine
Middleware
IOS

Apply Now

About Company

DeepMind

Artificial intelligence could be one of humanity’s most useful inventions. We research and build safe artificial intelligence systems. We're committed to solving intelligence, to advance science and benefit humanity.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Research Engineer, Audio Dialogue

Mountain View, CA - USA

Job Summary

Snapshot

Who are we

Our mission

Job responsibilities

About You

Snapshot

Who are we

Our mission

Job responsibilities

About You

Key Skills

About Company

Related Jobs