OpenAI’s AI can clone any voice with only 15 seconds of audio

Web Desk

30th Mar, 2024. 03:29 pm
Share
- Listen
- Font size
  - Small
  - Medium
  - Large
- Dark Mode
- Save
- Print

Articles

OpenAI has recently launched Voice Engine.
Multiple companies have access to this technology.
OpenAI has implemented strict usage policies for its partners.

OpenAI has recently launched Voice Engine, a platform that transforms text into voice by creating synthetic voices from a short 15-second voice sample. This platform can translate the AI-produced voices to read text prompts in the speaker’s original language or in several other languages.

OpenAI elaborates in a blog post that these small-scale deployments play a vital role in shaping their strategy, security protocols, and wider potential applications across various sectors.

A number of companies have been given access to this technology, including educational tech company Age of Learning, storytelling tool HeyGen, health software maker Dimagi, AI-powered communication app Livox, and healthcare provider Lifespan.

OpenAI’s shared examples demonstrate how Age of Learning uses Voice Engine to generate pre-scripted voice-overs and create “real-time, personalised responses” for students using GPT-4.

The technology, which OpenAI began developing in late 2022, has already been used to generate preset voices for its text-to-speech API and the Read Aloud feature in ChatGPT. Jeff Harris from OpenAI’s Voice Engine product team disclosed in a TechCrunch interview that the model was trained using a mix of licensed and publicly available data. However, OpenAI intends to restrict access to approximately 10 developers.

The domain of AI text-to-audio generation continues to progress, with companies like Podcastle and ElevenLabs concentrating on AI voice cloning technology. However, these advancements raise ethical concerns, as discussed by the Vergecast last year.

Simultaneously, the US government is tackling the unethical uses of AI voice technology. For example, the Federal Communications Commission recently banned AI voice-based robocalls, such as those imitating President Joe Biden.

OpenAI has implemented strict usage policies for its partners. These include prohibitions on using Voice Generation for impersonation without consent, requirements for explicit permission from the voice sample’s source, restrictions on user-generated voices, and the obligation to inform listeners that the voices are AI-generated. Additionally, OpenAI has included watermarking in the audio for traceability and monitors usage.

To lessen potential risks, OpenAI proposes various measures like discontinuing voice authentication for banking, setting up policies for voice use in AI, educating about AI deepfakes, and developing systems to track AI-generated content.

OpenAI’s AI can clone any voice with only 15 seconds of audio

Web Desk

Articles