Meta this week previewed(Opens in a new window) a voice-based generative AI model that could one day swap your virtual assistant’s voice for the voice of someone you know.
“Voicebox can produce high-quality audio clips and edit pre-recorded audio — like removing car horns or a dog barking — all while preserving the content and style of the audio,” Meta says. “The model is also multilingual and can produce speech in six languages.”
Meta teased Voicebox as a way to make virtual assistants sound less robotic or power non-playable characters in the metaverse. But for now, we’re just getting a sneak peek.
“Because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time,” the company says(Opens in a new window). “While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI, it’s also necessary to strike the right balance between openness with responsibility.”
To that end, Meta’s AI team shared audio samples and a research paper(Opens in a new window) that details the results they’ve achieved thus far.
In a video demonstrating text-to-speech capabilities, we see an audio clip run through Voicebox produce the same phrase in six different voice styles. It also takes a clip of someone talking and has their voice read a different phrase they never uttered in real life.
It also stripped out audio of a dog barking in the background of meeting, and changed a word (“guys” to “everyone”) in the final, recorded version. And, it took audio of someone saying something in a different language and had that voice say it in English, using their voice style.
The news comes several months after CEO Mark Zuckerbeg said the
Read more on pcmag.com