SpeechX – Microsoft Research

0

SpeechX – Microsoft Research

SpeechX – Microsoft Research

“SpeechX is a versatile speech generation model leveraging audio and text prompts, which can deal with both clean and noisy speech inputs and perform zero-shot TTS and various tasks involving transforming the input speech. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting. This enables unified treatment of various tasks in an extensible manner, providing a consistent way of leveraging text input for speech enhancement and transformation.  The current model, trained on 60K hours of speech audio, can perform zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing, where the spoken content can be altered while preserving the speaker and background sounds…”

Source: www.microsoft.com/en-us/research/project/speechx/

August 31, 2023
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest