AI Researcher (Multimodal AudioVideo Generation)

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About Us

Tavus is a research lab pioneering human computing. Were building AI Humans: a new interface that closes the gap between people and machines free from the friction of todays systems. Our real-time human simulation models let machines see hear respond and even look realenabling meaningful face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines making them capable trusted agents available 24/7 in every language on our terms.

Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus individuals enterprises and developers can all build AI Humans to connect understand and act with empathy at scale.

Were a Series A company backed by world-class investors including Sequoia Capital Y Combinator and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role
Were looking for an AI Researcher to join our core AI team and push forward the science of audio-visual avatar generation. If you thrive in high-speed startup environments enjoy experimenting with generative models and love seeing your research ship into production then youll feel right at home.

Your Mission

Research and develop audio-visual generation models for conversational agents (e.g. Neural Avatars Talking-Heads).
Focus on models that are tightly coupled with conversation flow ensuring verbal and non-verbal signals work seamlessly together.
Experiment with diffusion models (DDPMs LDMs etc.) long-video generation and audio generation.
Collaborate with the Applied ML team to bring your research into real-world production.
Stay ahead of the latest advancements in multimodal generation and help shape the next wave.

Youll Be Great At This If You Have:

A PhD (or near completion) in a relevant field or equivalent hands-on research experience.
Experience applying image/video generation models in practice.
Strong foundations in generative modeling and rapid prototyping.
Deep familiarity with diffusion models including recent advances in efficiency.
Good understanding of video-language models and multimodal generation.
Proficiency in PyTorch and GPU-based inference.

Nice-to-Haves

Experience with long-video or audio generation.
Skills in 3D graphics Gaussian splatting or large-scale training setups.
Broader exposure to generative models and rendering.
Familiarity with software engineering best practices.
Publications in top-tier or respected venues (CVPR NeurIPS BMVC ICASSP etc.).

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within U.S. or Europe available for exceptional candidates.

Required Experience:

About UsTavus is a research lab pioneering human computing. Were building AI Humans: a new interface that closes the gap between people and machines free from the friction of todays systems. Our real-time human simulation models let machines see hear respond and even look realenabling meaningful fac...

About Us

Were a Series A company backed by world-class investors including Sequoia Capital Y Combinator and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

Your Mission

Research and develop audio-visual generation models for conversational agents (e.g. Neural Avatars Talking-Heads).
Focus on models that are tightly coupled with conversation flow ensuring verbal and non-verbal signals work seamlessly together.
Experiment with diffusion models (DDPMs LDMs etc.) long-video generation and audio generation.
Collaborate with the Applied ML team to bring your research into real-world production.
Stay ahead of the latest advancements in multimodal generation and help shape the next wave.

Youll Be Great At This If You Have:

A PhD (or near completion) in a relevant field or equivalent hands-on research experience.
Experience applying image/video generation models in practice.
Strong foundations in generative modeling and rapid prototyping.
Deep familiarity with diffusion models including recent advances in efficiency.
Good understanding of video-language models and multimodal generation.
Proficiency in PyTorch and GPU-based inference.

Nice-to-Haves

Experience with long-video or audio generation.
Skills in 3D graphics Gaussian splatting or large-scale training setups.
Broader exposure to generative models and rendering.
Familiarity with software engineering best practices.
Publications in top-tier or respected venues (CVPR NeurIPS BMVC ICASSP etc.).

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within U.S. or Europe available for exceptional candidates.

Required Experience:

Key Skills

Employee Relations
Machine Shop
Crystal Report
Content Marketing
Analytics

Apply Now

About Company

Tavus

Tavus is the leading AI video research company that enables product development teams to build white-labeled digital twin experiences with easy-to-use APIs.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click