Robots receive a scary-accurate new voice, courtesy of Google’s DeepMind

Robots receive a scary-accurate new voice, courtesy of Google's DeepMind

This site may earn affiliate commissions from the links on this page. Terms of use.

Sound Wave iStock

Any you may call up of the robotic voices foisted upon the globe thanks to Google Voice Search and Siri, yous're unlikely to mistake them for human voices. For years, the country of the art in calculator speech communication synthesis has been stuck at a adequately low level. Even so, new software called WaveNet, from the brainiacs at DeepMind, is setting a high watermark in the field of spoken language synthesis and giving AI a vocalisation eerily similar to that of a human.

For years robotics have spoken near something called the uncanny valley – the creepy feeling one gets when observing a robot that is also mechanistic to exist mistaken for a human, but not quite mechanical plenty to exist distinctly robotic, either.

Perchance one reason in that location has been no parallel concept for robotic speech is that to date, no speech communication synthesizer was capable of attaining a quality that came close enough to a human equally to be disturbingly similar. With DeepMind'southward WaveNet, we may exist witnessing the emergence of something similar an uncanny waveform, a robotic vocalisation shut enough to our ain equally to be distinctly creepy. Or like me, y'all may just rejoice that finally there's hope for an ebook reader that doesn't sound like the re-animated corpse of a 1980'south Commodore reckoner.

The undercover sauce backside this new standard in robotic speech, ironically plenty, is artificial intelligence — admitting with a little help from some smart software engineers along the way.

comparison of text to speech methods

Next comparing of text to speech methods as rated by human listeners. Paradigm Source: DeepMind world wide web.deepmind.com

We may besides go used to this situation, every bit it looks increasingly that advancements made in things like robotics and AI volition be realized with the help of artificial intelligence itself. While this virtuous feedback loop even so includes homo intermediaries, a trend towards cocky-improving AI may be in the offing — forth with all the concomitant existential risks this betokens. Regardless, allow's have a closer wait at WaveNet and meet how artificial intelligence has enabled and is, indeed, the courage backside DeepMind's new speech synthesizer.

To appointment, about speech synthesizers were of two types — concatenative text to speech and parametric text to speech. Concatenative text to oral communication is the method behind the so-called "high quality" speech synthesizers used by Google Voice and Siri. It provides a more realistic sound past using large audio files of real people's voices, chopped up and reorganized to form whatever give-and-take the reckoner is enunciating. The downside is that it is difficult to color the speech with changes of emotion or accent.

The alternative method, parametric voice communication, uses a rule-based arrangement discovered by applying statistical models to voice communication patterns. The stilted and robotic-sounding speech synthesizers are mostly of this latter type, since they rely upon the computer to generate the audio signal rather than recordings of existent human voices.

deepmind head

The WaveNet system tin can be thought of every bit an improvement upon concatenative text to speech, in that it withal employs recordings of real man voices. Merely instead of chopping these upwardly and reorganizing them in the old way, it uses an bogus neural network to generate synthetic utterances based upon the voices it was trained with. The downside is that this organisation is computationally intensive. Modeling raw audio typically requires 16,000 samples per second, with each sample existence influenced by all the previous ones. This is well across the processing power of a typical smartphone, but non unthinkable for GPUs like Nvidia's DGX-1 deep learning supercomputer.

DeepMind has some audio samples posted up on its WaveNet page if you desire to hear what it sounds like. For the time existence, while y'all're unlikely to meet WaveNet out in the wild, it'south not unthinkable that this arrangement will someday power the voice on your ebook reader or a smart home console — that is, if a recursive cocky-improving AI hasn't obliterated humankind beginning.

Now read: Artificial neural networks are changing the earth — what are they?