Senior+ AI Researcher (Multimodal Perception Models)

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About Us

Tavus is a research lab pioneering human computing. Were building AI Humans: a new interface that closes the gap between people and machines free from the friction of todays systems. Our real-time human simulation models let machines see hear respond and even look realenabling meaningful face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines making them capable trusted agents available 24/7 in every language on our terms.

Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus individuals enterprises and developers can all build AI Humans to connect understand and act with empathy at scale.

Were a Series A company backed by world-class investors including Sequoia Capital Y Combinator and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role
Were hiring a Senior AI Researcher to lead foundational research at the intersection of multimodal modeling and conversational AI. Youll build multimodal models that model interactions between the user and the human like avatar to predict/control the audio/visual/language responses of the avatar. This isnt a role for someone who just wants to follow the roadmap. Youll help create it by steering our technical direction and shaping what the next generation of AI Human avatars looks like.

Your Mission

Lead research on Foundational Multimodal Models for Conversational Avatars systems that can perceive reason and generate across video audio and language.
Build and train models using Autoregressive Predictive (e.g. V-JEPA) and Diffusion-based architectures with a deep focus on temporal and sequential data (not static frames).
Design and execute experiments to predict and control the visual auditory and linguistic responses of avatars.
Partner with the Applied ML team to bring research into real-world use cases.
Mentor other researchers and drive excellence across the team.

Youll Bring:

A PhD plus 23 years working hands-on with LLMs VLMs or multimodal systems.
Previous experience leading research efforts or mentoring teams.
Expertise in sequence modeling across video audio and text with strong understanding of autoregressive predictive and diffusion frameworks.
Experience with large-scale model training and optimization for performance and real-time generation.
Proven ability to translate research ideas into production-grade systems.
Publications in top-tier venues (CVPR ICCV NeurIPS ECCV ACMMM).
Strong PyTorch skills and comfort moving fluidly between research and engineering.

Nice-to-Haves

Broad familiarity with generative AI paradigms and foundation models.
Comfort working across the full researchtodeployment stack.
A builders mindset: eager to experiment iterate and ship.

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.

Benefits & Culture

When you join Tavus youre joining a diverse and supportive team. Our work is driven by our people and our success is shared by all. This position has a flexible work schedule unlimited PTO competitive healthcare and gear stipends as well as plenty of fun. At the end of the day we want Tavus to be a place for you to learn directly drive impact and work with a team you love.

Tavus is growing fast and wed like you to grow with us. If youre excited to get your hands dirty and help make machines more human drop your resume and well be in touch.

We are not looking for cultural fits we are looking for culture creators. Diversity is what drives our success its at the core of how we hire communicate and work. We are inclusive to all and combine our diverse backgrounds skill sets and perspectives to build the best experiences for our clients.

Required Experience:

About UsTavus is a research lab pioneering human computing. Were building AI Humans: a new interface that closes the gap between people and machines free from the friction of todays systems. Our real-time human simulation models let machines see hear respond and even look realenabling meaningful fac...

About Us

Were a Series A company backed by world-class investors including Sequoia Capital Y Combinator and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

Your Mission

Lead research on Foundational Multimodal Models for Conversational Avatars systems that can perceive reason and generate across video audio and language.
Build and train models using Autoregressive Predictive (e.g. V-JEPA) and Diffusion-based architectures with a deep focus on temporal and sequential data (not static frames).
Design and execute experiments to predict and control the visual auditory and linguistic responses of avatars.
Partner with the Applied ML team to bring research into real-world use cases.
Mentor other researchers and drive excellence across the team.

Youll Bring:

A PhD plus 23 years working hands-on with LLMs VLMs or multimodal systems.
Previous experience leading research efforts or mentoring teams.
Expertise in sequence modeling across video audio and text with strong understanding of autoregressive predictive and diffusion frameworks.
Experience with large-scale model training and optimization for performance and real-time generation.
Proven ability to translate research ideas into production-grade systems.
Publications in top-tier venues (CVPR ICCV NeurIPS ECCV ACMMM).
Strong PyTorch skills and comfort moving fluidly between research and engineering.

Nice-to-Haves

Broad familiarity with generative AI paradigms and foundation models.
Comfort working across the full researchtodeployment stack.
A builders mindset: eager to experiment iterate and ship.

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.

Benefits & Culture

Tavus is growing fast and wed like you to grow with us. If youre excited to get your hands dirty and help make machines more human drop your resume and well be in touch.

Required Experience:

Key Skills

Arm
Machine Learning
AI
C/C++
R
Clinical Trials
Experience Administering Injections
Research Experience
Research & Development
Assembly
Semantic Web
Vulnerability Research

Apply Now

About Company

Tavus

Tavus is the leading AI video research company that enables product development teams to build white-labeled digital twin experiences with easy-to-use APIs.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior+ AI Researcher (Multimodal Perception Models)

San Francisco, CA - USA

Job Summary

About Us

LocationPreferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.

Benefits & Culture

About Us

LocationPreferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.

Benefits & Culture

Key Skills

About Company

Related Jobs

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.

Location
Preferred: San Francisco (hybrid) or London (office opening soon). Remote within the U.S. or Europe available for exceptional candidates.