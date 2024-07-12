Microsoft created a text-to-speech AI generation model that was so good at replicating human voices the company deemed it too unsafe to bring to market.

Artificial intelligence-powered tools such as ChatGPT are only getting more sophisticated and impressive, but what happens when they get too good is that it's impossible to distinguish between humans and machines.

Unfortunately, that has already been achieved, or at least when it comes to AI voice generators. LiveScience spotted Microsoft quietly explaining it created an AI text-to-voice generator that was so powerful the company deemed it too unsafe to release to the public as the model was able to "generate accurate, natural speech in the exact voice of the original speaker". As you can probably imagine, having a tool available to the public would undoubtedly result in an increase in fraud, impersonations, etc.

Microsoft's dangerous AI model is called VALL-E 2, and in a pre-print paper posted on June 17 researchers explain the model marks a milestone in text-to-speech synthesis and it has achieved "human parity for the first time." What this means is Microsoft's internal benchmarks found VALL-E 2 was able to replicate human speech or even exceed it in some cases.

"Our experiments, conducted on the LibriSpeech and VCTK datasets, have shown that VALL-E 2 surpasses previous zero-shot TTS systems in speech robustness, naturalness, and speaker similarity," the researchers wrote. "It is the first of its kind to reach human parity on these benchmarks."

Microsoft states that VALL-E 2 is "purely a research project," with the company explaining it has no plans to incorporate VALL-E 2 into a product or expand its access to the public. However, Microsoft did outline some use cases for the technology, writing in a blog post VALL-E 2 could be taken advantage of by people in the following industries: education, journalism, self-authoring content, accessibility features, voice response systems, translation, and chatbots.