Business

OpenAI Unveils Artificial Intelligence Technology That Re-Creates Human Voices

SAN FRANCISCO -- First, OpenAI offered a tool that allowed people to create digital images simply by describing what they wanted to see. Then, it built similar technology that generated full-motion video like something from a Hollywood movie.

Posted 2024-03-29T20:10:15+00:00 - Updated 2024-03-29T22:17:51+00:00

Cade Metz

, New York Times

SAN FRANCISCO — First, OpenAI offered a tool that allowed people to create digital images simply by describing what they wanted to see. Then, it built similar technology that generated full-motion video like something from a Hollywood movie.

Now, it has unveiled technology that can re-create someone’s voice.

The high-profile artificial intelligence startup said Friday that a small group of businesses was testing a new OpenAI system, Voice Engine, that can re-create a person’s voice from a 15-second recording. If you upload a recording of yourself and a paragraph of text, it can read the text using a synthetic voice that sounds like yours.

The text does not have to be in your native language. If you are an English speaker, for example, it can re-create your voice in Spanish, French, Chinese or many other languages.

OpenAI is not sharing the technology more widely because it is still trying to understand its potential dangers. Like image and video generators, a voice generator could help spread disinformation across social media. It could also allow criminals to impersonate people online or during phone calls.

The company said it was particularly worried that this kind of technology could be used to break voice authenticators that control access to online banking accounts and other personal applications.

“This is a sensitive thing, and it is important to get it right,” an OpenAI product manager, Jeff Harris, said in an interview.

The company is exploring ways of watermarking synthetic voices or adding controls that prevent people from using the technology with the voices of politicians or other prominent figures.

In February, OpenAI took a similar approach when it unveiled its video generator, Sora. It showed off the technology but did not publicly release it.

OpenAI is among the many companies that have developed a new breed of AI technology that can quickly and easily generate synthetic voices. They include tech giants such as Google as well as startups such as New York-based ElevenLabs. (The New York Times has sued OpenAI and its partner, Microsoft, on claims of copyright infringement involving AI systems that generate text.)

Businesses can use these technologies to generate audiobooks, give voice to online chatbots or even build an automated radio station DJ. Since last year, OpenAI has used its technology to power a version of ChatGPT that speaks. And it has long offered businesses an array of voices that can be used for similar applications. All of them were built from clips provided by voice actors.

But the company has not yet offered a public tool that would allow individuals and businesses to re-create voices from a short clip as Voice Engine does. The ability to re-create any voice in this way, Harris said, is what makes the technology dangerous. The technology could be particularly dangerous in an election year, he said.

In January, New Hampshire residents received robocall messages that dissuaded them from voting in the state primary in a voice that was most likely artificially generated to sound like President Joe Biden. The Federal Communications Commission later outlawed such calls.

Harris said OpenAI had no immediate plans to make money from the technology. He said the tool could be particularly useful to people who lost their voices through illness or accident.

He demonstrated how the technology had been used to re-create a woman’s voice after brain cancer damaged it. She could now speak, he said, after providing a brief recording of a presentation she had once made as a high schooler.

This article originally appeared in The New York Times.