Senior Research Engineer Audio Post-Training

London - UK

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Welcome to the video first world

From your everyday PowerPoint presentations to Hollywood movies AI will transform the way we create and consume content.

Today people want to watch and listen not read both at home and at work. If youre reading this and nodding check out our brand video.

Despite the clear preference for video communication and knowledge sharing in the business environment are still dominated by text largely because high-quality video production remains complex and challenging to scaleuntil now.

Meet Synthesia

Were on a mission to make video easy for everyone. Born in an AI lab our AI video communications platform simplifies the entire video production process making it easy for everyone regardless of skill level to create collaborate and share high-quality videos. Whether its for delivering essential training to employees and customers or marketing products and services Synthesia enables large organizations to communicate and share knowledge through video quickly and efficiently. Were trusted by leading brands such as Heineken Zoom Xerox McDonalds and more. Read stories from happy customers and what 1200 people say on G2.

In February 2024 G2 named us as the fastest growing company in the world. Today were at a $2.1bn valuation and we recently raised our Series D. This brings our total funding to over $330M from top-tier investors including Accel Nvidia Kleiner Perkins Google and top founders and operators including Stripe Datadog Miro Webflow and Facebook.

What youll do at Synthesia:

As a Research Engineer you will join a team of 40 Researchers and Engineers within the R&D Department working on cutting-edge challenges in the Generative AI space with a focus on creating high-quality expressive and real-time synthetic voices. Within the team youll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60000 businesses.

If you are an expert in ML LLMs speech generation conversational modelsthis is your chance to make a global impact. You will join ourAudio Post-Training Team which works ongenerative speech and voice synthesis ensuring our in-house voice models reach production-level quality speed and robustness. Typical projects include:

Adapt models for new conditioning inputs (emotion speed prosody speaker control etc.).
Fine-tune and optimize speech models using advanced techniques such as DPO (Direct Preference Optimization) LoRA and other parameter-efficient methods to improve voice quality and expressiveness.
Implement post-training optimization techniques (quantization pruning distillation) to improve efficiency and latency in real-time speech generation.
Integrate and test novel architectures such as neural codecs diffusion or flow-matching models to enhance realism and responsiveness.
Design and implement new evaluation metrics for TTS systems including automated Mean Opinion Score (MOS) prediction models for continuous quality assessment.
Stay updated with the latest research in audio diffusion autoregressive models neural codecs and multimodal LLMs.

What were looking for:

Strong understanding of generative modelling ideally applied to sequential or multimodal data.
Hands-on experience with large language models (LLMs) or similar transformer-based architectures.
High proficiency in PyTorch including experience with distributed training and model optimization.
Solid grasp of time-series modelling and tokenization preferably in the context of audio or speech.
Demonstrated ability to prototype quickly test hypotheses and iterate efficiently.
Proven experience in training deep learning models end-to-end from data preparation to evaluation.
Strong general software engineering skills enabling contributions to a large shared research infrastructure.

Nice-to have experience

Familiarity with state-of-the-art architectures in audio and speech generation (e.g. diffusion models neural codecs flow-matching models autoregressive decoders).
Experience with speech-to-speech or text-to-speech (TTS) systems.
Evidence of original research contributions such as publications or open-source work in top-tier venues (e.g. ICASSP Interspeech NeurIPS ICML).

Why join us

Were living the golden age of AI. The next decade will yield the next iconic companies and we dare to say we have what it takes to become one. Heres why

Our culture

At Synthesia were passionate about building not talking planning or politicising. We strive to hire the smartest kindest and most unrelenting people and let them do their best work without distractions. Our work principles serve as our charter for how we make decisions give feedback and structure our work to empower everyone to go as fast as possible.You can find out more about these principles here.

Serving 50000 customers (and 50% of the Fortune 500)

Were trusted by leading brands such as Heineken Zoom Xerox McDonalds and more. Readstories from happy customersand what 1200 people say on G2.

Proprietary AI technology

Since 2017 weve been pioneering advancements in Generative AI. Our AI technology is built in-house by a team of world-class AI researchers and engineers. Learn more aboutour AI Research Lab and the team behind.

AI Safety Ethics and Security

AI safety ethics and security are fundamental to our mission. While the full scope of Artificial Intelligences impact on our society is still unfolding our position is clear:People first. Always. Learn more aboutour commitments to AI Ethics Safety & Security.

The good stuff...

Competitive compensation (salary stock options bonus)
Fully remote from Europe or hybrid work setting with an office in London Amsterdam Zurich Munich
25 days of annual leave public holidays
Great company culture with the option to join regular planning and socials at our hubs
other benefits depending on your location

You can see more about Who we are and How we work here: Experience:

Senior IC

Welcome to the video first worldFrom your everyday PowerPoint presentations to Hollywood movies AI will transform the way we create and consume content.Today people want to watch and listen not read both at home and at work. If youre reading this and nodding check out our brand video.Despite the cl...

Welcome to the video first world

From your everyday PowerPoint presentations to Hollywood movies AI will transform the way we create and consume content.

Today people want to watch and listen not read both at home and at work. If youre reading this and nodding check out our brand video.

Meet Synthesia

What youll do at Synthesia:

Adapt models for new conditioning inputs (emotion speed prosody speaker control etc.).
Fine-tune and optimize speech models using advanced techniques such as DPO (Direct Preference Optimization) LoRA and other parameter-efficient methods to improve voice quality and expressiveness.
Implement post-training optimization techniques (quantization pruning distillation) to improve efficiency and latency in real-time speech generation.
Integrate and test novel architectures such as neural codecs diffusion or flow-matching models to enhance realism and responsiveness.
Design and implement new evaluation metrics for TTS systems including automated Mean Opinion Score (MOS) prediction models for continuous quality assessment.
Stay updated with the latest research in audio diffusion autoregressive models neural codecs and multimodal LLMs.

What were looking for:

Strong understanding of generative modelling ideally applied to sequential or multimodal data.
Hands-on experience with large language models (LLMs) or similar transformer-based architectures.
High proficiency in PyTorch including experience with distributed training and model optimization.
Solid grasp of time-series modelling and tokenization preferably in the context of audio or speech.
Demonstrated ability to prototype quickly test hypotheses and iterate efficiently.
Proven experience in training deep learning models end-to-end from data preparation to evaluation.
Strong general software engineering skills enabling contributions to a large shared research infrastructure.

Nice-to have experience

Familiarity with state-of-the-art architectures in audio and speech generation (e.g. diffusion models neural codecs flow-matching models autoregressive decoders).
Experience with speech-to-speech or text-to-speech (TTS) systems.
Evidence of original research contributions such as publications or open-source work in top-tier venues (e.g. ICASSP Interspeech NeurIPS ICML).

Why join us

Were living the golden age of AI. The next decade will yield the next iconic companies and we dare to say we have what it takes to become one. Heres why

Our culture

Serving 50000 customers (and 50% of the Fortune 500)

Were trusted by leading brands such as Heineken Zoom Xerox McDonalds and more. Readstories from happy customersand what 1200 people say on G2.

Proprietary AI technology

AI Safety Ethics and Security

The good stuff...

Competitive compensation (salary stock options bonus)
Fully remote from Europe or hybrid work setting with an office in London Amsterdam Zurich Munich
25 days of annual leave public holidays
Great company culture with the option to join regular planning and socials at our hubs
other benefits depending on your location

You can see more about Who we are and How we work here: Experience:

Senior IC

Key Skills

Computer Science
E & I
Debugging
C/C++
Objective C
Swift
OS Kernels
Signal Processing
Matlab
Unreal Engine
Middleware
IOS

Apply Now

About Company

Synthesia

Create AI videos from text with AI video generator. Get most advanced AI avatars and voiceovers in 130+ languages. Try free AI video generator today!

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Research Engineer Audio Post-Training

London - UK

Job Summary

Welcome to the video first world

Meet Synthesia

What youll do at Synthesia:

What were looking for:

Why join us

Our culture

Serving 50000 customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety Ethics and Security

The good stuff...

Welcome to the video first world

Meet Synthesia

What youll do at Synthesia:

What were looking for:

Why join us

Our culture

Serving 50000 customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety Ethics and Security

The good stuff...

Key Skills

About Company

Related Jobs