Photo of Aäron van den Oord

Artificial intelligence & robotics

Aäron van den Oord

His AI system creates artificial voices that sound remarkably human.

Year Honored
2021

Organization
DeepMind

Region
Global

Hails From
US

In 2016, Aäron van den Oord had just won an award for his research in image generation when he was struck by an idea. If his technique could learn to predict a two-dimensional sequence of pixels, could it also learn to predict a waveform and thus generate realistic voices? The idea was intriguing but seemed like a long shot. His manager at DeepMind, an AI research subsidiary of Google, gave him two weeks to try it out, saying that if it didn’t work, he should move on to something else.

The results beat everyone’s expectations. Within two weeks, van den Oord had a prototype. Within three months, it was generating more realistic voices than any existing systems. Within another year, Google had begun using WaveNet, as the system came to be called, to generate voices for Google Assistant.

WaveNet now powers 51 voices as well as Google’s newest voice assistant, which calls salons and restaurants on behalf of users to book appointments or reserve tables. The results are startlingly realistic. When Google CEO Sundar Pichai first demoed Duplex in 2018, with all its human-like “umms” and “ahs,” it set a new bar for what can be possible when people communicate with machines.

While voice assistants need to do more than just generate a synthetic voicethey also need to be able to recognize when someone is talking and understand what’s being said, each of which is a challenge unto itselfresearchers have long sought to create the right artificial voice for achieving natural and engaging conversations. “There’s a lot of meaning in a voice,” says van den Oord.