‘Universal translator’ dubs and lip-syncs speakers – but Google warns against misuse

2 years ago 242

Google is testing a powerful new translation service that redubs video in a new language while also synchronizing the speaker’s lips with words they never spoke. It could be very useful for a lot of reasons, but the company was upfront about the possibility of abuse and the steps taken to prevent it.

“Universal Translator” was shown off at Google I/O during a presentation by James Manyika, who heads up the company’s new “Technology and Society” department. It was offered as an example of something only recently made possible by advances in AI, but simultaneously presenting serious risks that have to be reckoned with from the start.

The “experimental” service takes an input video, in this case a lecture from an online course originally recorded in English, transcribes the speech, translates it, regenerates the speech (matching style and tone) in that language, and then edits the video so that the speaker’s lips more closely match the new audio.

So it’s basically a deepfake generator, right? Yes, but the technology that’s used for malicious purposes elsewhere has genuine utility. There are actually companies that do this kind of thing right now in the media world, redubbing lines in post-production for any of a dozen reasons. (The demo was impressive, but it must be said the tech still has a way to go.)

But those tools are professional ones being made available in a strict media workflow, not a checkbox on a YouTube upload page. Neither is Universal Translator — yet — but if it is ever to be so, Google needs to reckon with the possibility of it being used to create disinformation or other unforeseen hazards.

Manyika called this a “tension between boldness and safety,” and striking a balance can be difficult. But clearly it can’t just be released widely for anyone to use with no restrictions. Yet the benefits — for example, making an online course available in 20 languages without subtitles or re-recording — are undeniable.

“This is an enormous step forward for learning comprehension, and we’re seeing promising results in course completion rates,” Manyika said. “But there’s an inherent tension here: Some of the same underlying technology could be misused by bad actors to create deepfakes. So we built the service with guardrails to prevent misuse, and we make it accessible only to authorized partners. Soon we’ll be integrating new innovations in watermarking into our latest generative models to also help with the challenge of misinformation.”

Here’s everything Google has announced at I/O so far

That’s certainly a start, but we’ve seen how those same bad actors are highly capable when it comes to circumventing such roadblocks. The “guardrails” are a bit hand-wavy, and sharing solely with partners works only so long as the model doesn’t leak — as they tend to. Watermarking is a good path to pursue as well, of course, but so far most approaches to that have been defeated by trivial edits like cropping, resizing, and other minor manipulations to the watermarked media.

Google demonstrated a lot of AI capabilities today both new and familiar, but whether and how they will be both useful and safe is kind of still a mystery. But giving someone like Manyika (a researcher himself) stage time at their biggest event to say “wow, this could be bad so we’re doing this and that, who knows if it will work” is at least a fairly honest way to approach the problem.