Meta has taken to its website to announce Voicebox, a new generative AI model that Meta described as a "breakthrough in generative AI for speech".

According to the press release, Voicebox is aimed at creators that are looking to speed up the audio editing process, such as removing car horns or a dog barking from the background of pre-recorded audio clips.
Additionally, Voicebox will be used to assist visually impaired people in hearing written messages from friends and family. Meta states that Voicebox is multilingual, being fluent in six languages, users will be able to speak any foreign language in their own voice. Languages include English, French, German, Spanish, Polish or Portuguese.
How does this work? It's quite simple. A user would give Voicebox an audio sample of their voice. The sample can be as short as a two-second clip of their voice. From there, the artificial intelligence extrapolates the voice and creates a specific audio style, which gives the user a voice replication.
Voicebox has raised some important ethical questions as people will now be able to replicate loved ones, best friends, and enemies' voices with as little as two seconds of audio. What are the unintended implications of such a technology?
Meta is aware of the potential danger of such a technology and has thankfully kept Voicebox's underlying code under wraps.
"There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time," the company wrote in a research blog.