Meta unveils Voicebox AI: Should we all be worried?

Unpacking the promises and perils of Meta's new synthetic voice technology

By Kurt Knutsson, CyberGuy Report Fox News

Published July 7, 2023 2:14pm EDT

You've probably heard about deepfakes for images and videos. Those eerily realistic videos created with AI? Now, it seems Meta (formerly known as Facebook) has developed a new AI model called Voicebox that's all about audio. It's like a supercharged text-to-speech system that can create synthetic voices from just a text prompt.

CLICK TO GET KURT’S FREE CYBERGUY NEWSLETTER WITH SECURITY ALERTS, QUICK TIPS, TECH REVIEWS AND EASY HOW-TO’S TO MAKE YOU SMARTER

What is Voicebox?

At its core, Voicebox is an AI model that creates synthetic voices based on simple text prompts. In other words, you give it some text, and it will read it out loud in a voice that sounds human. It's similar to the text-to-speech function you might use on your phone or computer, but it takes things to a whole new level.

One thing that sets Voicebox apart is its ability to replicate specific voice styles based on a very short audio sample – we're talking as little as two seconds! This means you could potentially have a synthetic voice that sounds like your favorite celebrity or even your own voice. It's almost like having a voice actor on demand, ready to read out anything you want in the voice style of your choosing.

Competing AI voice models

Speechify

Speechify and ElevenLabs are also players in the text-to-speech game. Speechify is an app that turns any text into audio. It can read books, articles, notes, emails, PDFs, images, and web pages aloud. Speechify also claims to offer voice cloning, voice editing, and voice sampling features. Speechify offers hundreds of free timeless audiobooks, has a desktop app, and is designed to help people with reading disabilities.

The Meta logo on a phone (Costfoto/NurPhoto via Getty Images)

MARK ZUCKERBERG ‘TWITTER KILLER’ THREADS ENRAGES USERS OVER MASS DATA COLLECTION: 'NEAR ZERO PRIVACY

ElevenLabs

ElevenLabs, on the other hand, is a startup that uses AI to generate synthetic voices with context-relevant emotions and natural language understanding. They offer a platform for creating and customizing high-quality spoken audio in any voice and style for various industries, such as video games, animations, digital assistants, education, entertainment, advertising, and podcasting. They also have a tool for detecting synthetic voices and verifying their authenticity. ElevenLabs works with actors who provide their voice samples and get paid when their voice clones are used. They use proprietary deep learning models to create their AI-delivered speeches.

They’re both pretty cool, but they don’t quite have the same versatility as Voicebox, which can mimic real voices from just a few seconds of audio. It’s like comparing a Swiss Army knife to a few really good spoons. They all have their uses, but one is definitely more multipurpose.

The power of Voicebox

But it's not just about creating fake voices. Voicebox can also tidy up your audio by removing annoying background noise – let's say, a dog yapping while you're trying to record. And it's not just about English. This AI speaks French, Spanish, German, Polish and Portuguese, too, and can even translate passages from one language to another while keeping the same voice style.

MOVE OVER, SIRI: APPLE’S NEW AUDIOBOOK AI VOICE SOUNDS LIKE A HUMAN

The Meta (formerly Facebook) logo marks the entrance of their corporate headquarters in Menlo Park, California on November 09, 2022. - Facebook owner Meta will lay off more than 11,000 of its staff in "the most difficult changes we've made in Meta's history," boss Mark Zuckerberg said on Wednesday. (JOSH EDELSON/AFP via Getty Images)

Meta’s Voicebox: a breakthrough or a threat?

Unfortunately, or fortunately, depending on where you stand regarding AI, Meta isn't planning to open source Voicebox right away. That's got people wondering if they're trying to avoid some potential issues. For example, AI voice tech can be used negatively, like in harassment campaigns. Or, it might be that Meta has some future plans to make some money off this model.

The source of Voicebox’s massive training data

One interesting thing about Voicebox is that it's been trained on a ton of data—over 60,000 hours of speech from English audiobooks and another 50,000 hours from multilingual audiobooks. Meta says they used public domain audiobooks as their main data source, but they also used other sources such as podcasts, speeches, and radio shows. However, some challenges and limitations are associated with using public-domain audiobooks, such as quality, consistency, alignment, and speaker identity. Meta claims that they have addressed some of these issues with their data processing and model design.

FOR MORE OF MY SECURITY ALERTS, SUBSCRIBE TO MY FREE CYBERGUY REPORT NEWSLETTER BY HEADING TO CYBERGUY.COM/NEWSLETTER

The double-edged sword of technology

OBAMA AG RIPS ‘STUPID’ COURT ORDER AFTER JUDGE BLOCKS BIDEN ADMIN'S COMMUNICATION WITH SOCIAL MEDIA COMPANIES

The rise of AI voices is a bit of a touchy subject, especially for voice actors and, more recently, writers. They're worried about companies using AI to synthesize their voices without paying them. The audiobook market has been growing a lot, and companies are always looking to cut costs, so this could end up being another problem for voice professionals.

Don't be mistaken, however; it's not just about jobs. There are some real concerns about how deep fake voices can be used in scams. For instance, there was a case where a synthetic voice impersonating a CEO was used in a major heist. There's also the worry that deepfake voices could be used to mess with things like voice-biometric systems, which are used for things like online banking.

You see, as cool as this technology sounds, there's a darker side to it. Imagine getting a call from your boss asking you to transfer a massive sum of money to close out an account. You do as told because, well, it's your boss. Except, it wasn't. That's right; it was a fake, synthetic voice created using AI that sounded just like your boss. Wild, isn't it? But this isn't some movie plot; it actually happened! This was one of the first times a fake voice was used in a heist, and it left law enforcement and AI experts scratching their heads.

Condo was optimistic about the future of artificial intelligence. (Jakub Porzycki/NurPhoto via Getty Images)

DALLE-2 VS. BING CREATOR - WHICH COMES OUT ON TOP IN THIS AI SHOWDOWN?

And it's not just heists. Deepfake voices can be used to trick systems that rely on voice recognition. We're talking about things like online banking, which use your voice as a form of identification. If criminals can create a convincing fake voice of you, they could potentially access your accounts. It's a bit like forging a signature but with your voice instead.

Countering the deepfake threat

So, while we're marveling at the amazing things technology can do, it's also important to be aware of the potential risks and to stay one step ahead. It's like a high-tech game of cat and mouse, with AI experts and businesses working hard to spot and stop these deepfake voices before they can do any harm.

Luckily, there are folks out there trying to fight back against the potential misuse of deepfake voices. For example, some countries have started to pass laws to regulate deepfakes. Also, there are projects like the Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof), where scientists and engineers are working on ways to counter deepfake voice attack

Kurt's key takeaways

We're in an era where tech is evolving at breakneck speed and changing how we work, communicate, and even hear things. While the potential of AI like Meta's Voicebox is undoubtedly exciting, it's clear we also need to tread carefully. There's a fine line between innovation and invasion, a balance we're all still figuring out.

Experts argue difference between AI investment in China and the U.S. is the fact that the American model is driven by private companies whereas China takes a government approach (JOSEP LAGO/AFP via Getty Images)

CLICK HERE TO GET THE FOX NEWS APP

With all these advancements and potential risks, how do you feel about the future of AI and deepfake technology? Do you see it as a boon or a bane?  Let us know by writing us at Cyberguy.com/Contact

For more of my security alerts, subscribe to my free CyberGuy Report Newsletter by heading to Cyberguy.com/Newsletter

Copyright 2023 CyberGuy.com.  All rights reserved.

Kurt "CyberGuy" Knutsson is an award-winning tech journalist who has a deep love of technology, gear and gadgets that make life better with his contributions for Fox News & FOX Business beginning mornings on "FOX & Friends." Got a tech question? Get Kurt’s free CyberGuy Newsletter, share your voice, a story idea or comment at CyberGuy.com.

Load more..