Our natural language processing is what helps us make Google truly conversational with our users. And we have built state-of-the-art machine translation, image recognition and voice recognition systems. And each of these areas is being turbo charged by the progress we are seeing with Machine Learning and AI.
A few months ago, we captured the world’s attention when Deep Mind’s AlphaGo won the World Go championship against Lee Sedol, one of the finest players of our generation. It showed the external world a moment for AI has arrived but for us, the progress has been continuous and the strides are huge. In fact, in the three months since AlphaGo played that game, we have had meaningful launches and how Machine Learning is impacting the products we build.
Let me talk about three examples, all of which we have talked about in the past three months since the AlphaGo moment. First, image captioning. Image captioning is how computers try to make sense of images they look at. And we first launched our Machine Learning system in 2014. It was state of the art system and our quality was around just over 89%. Our newer Machine Learning systems now, the quality is close to 94%. 4% may not sound like much to you, but first, it’s really hard to increase quality at these levels because we are trying to approach human level accuracy.
And, second, every single percentage point translates into meaningful difference for our users. So, for example, if you take a look at the picture behind me, about two years ago, we used to understand this picture as a train is sitting on the tracks. Now, we understand the colors so we describe it as a blue and yellow train is traveling down the tracks.
Or if you look at this picture, two years ago we understood it as a brown bear is swimming in the water. Now, we can count, our computing systems can count, so we understand this is two brown bears sitting on top of rocks. Advances like this is what helps us when you are in Google Photos find the exact pictures you are looking for and be a better assistant to you.
Another example, machine translation. We have been doing machine translation for a while. And historically our systems are statistics based and we translate at a phrase by phrase level. So we translate individual phrases and combine them to form a translation. So if you look at this Chinese to English translation, you can see it makes sense, but it’s not quite the way humans would translate it. Just recently we announced our first end-to-end self-learning, deep learning machine translation systems. Rather than working at a phrase level, they take entire sentences and model sentences as outputs. And that’s what you see in the middle and you can see it as approaching human level translation. You can look at this quantitatively, and we have a beta to measure these things quantitatively, and if you look at our previous phrase based system it was quite far from the human system and we closed a significant gap with our new Machine Learning systems.
In fact, the progress for Chinese to English is so significant, last week we rolled it out in production and so today if you pick the Google translate mobile app and try to translate from Chinese to English, you are using our newer Machine Learning systems and the progress has been amazing. We’ll literally translate billions of queries over the coming year. This is what will help us if a user in Indonesia is using the Google Assistant, we can find the right answer even if it doesn’t exist in their local language, translate it on the fly and get it to them.
Text to speech
Another example, text to speech. Text to speech is what we call when computers read something aloud back to you. So when you ask Google a question: Who is the Prime Minister of Canada? We understand the text and try to make it as natural as possible for you.
[Google Assistant: Justin Trudeau is the prime minister of Canada]
So this is text to speech. The way we do it today we get a speaker into our recording studios, we record them for thousands of hours. We make them say short phrases and then combine them to be as natural sounding as possible. Again, deep learning is showing the way. DeepMind just published a paper with a new technology called WaveNet. It’s a deep learning model where rather than modeling phrases they actually model the underlying raw audio waves to generate a much more natural sound. You can again see the WaveNet model is getting much closer to human speech. To me the reason this gets exciting is today all we can do is a single voice for the assistant for all context. Doing this is what will enable us to have multiple voices, multiple personalities, get the assistant to differentiate between German and Swiss German and one day even capture emotions when speaking to you. This is key to our vision for building an individual Google for each user, and more importantly, the assistant will continuously get better as we make progress with Machine Learning and AI.
It is early days but we are committed to this vision and we are going to work on it for a long time. But it’s equally important to get the assistant in the hands of our users. And that’s what today is about.
In fact, we started doing it about two weeks ago with our new messaging app, Google Allo in which users can invoke the Google assistant in group conversations. And the early reception has been great. They are interacting with it very naturally, asking us questions we expected like tell me a joke, but also questions we didn’t expect like what is the meaning of life. So it’s early days, but the assistant continuously learns from this experience and keeps getting better.
If you remember our vision for Google Assistant is to be universal, to be there everywhere the user needs it to be, which is why today we are going to bring the assistant to two more surfaces, one in the context of the phone which you always carry with you, and one in the context of your home.
To talk about the Google Assistant in new hardware products, let me invite Rick Osterloh, the head of our newly formed hardware group.
Rick Osterloh – SVP, Hardware Group
Thank you. Good morning. I am Rick. It’s an honor to be here today representing the hard work of so many of my colleagues. Well, I’ve been doing hardware for a long time and even I smile like a kid every time I get to unbox a new gadget.
Since I joined Google, one of the questions I get asked most often is: why should we build hardware? I mean we often joke that building hardware is, well, hard. People have strong, emotional connections to the products they rely on everyday. They are an important part of our users’ lives. But the rise of volume and complexity of all of the information makes it so that this is the right time to be focused on hardware and software. Let’s think about that for a moment.
At the peak of film photography, 80 billion photos were captured every year. But last year thanks to smartphones, 1 trillion photos were taken. Communications has gotten similarly complex. 328 billion items were delivered by the Post Office last year. And that’s compared to 50 trillion emails and mobile messages. And today people want more than a thousand songs in their pocket. What they want is the entire world’s music collection with them at all times. These informational changes mean the technology needs to be smart, and just work for you. This is why we believe the next big innovation is going to take place at the intersection of hardware and software with AI at the center. That’s where we have the biggest opportunity to bring people the very best of Google as we intended it.
Building hardware and software together lets us take full advantage of capabilities like the Google assistant. It lets us harness years of expertise we have built up in Machine Learning and AI to deliver the simple, smart and fast experiences that our users expect from us. It allows Google to be helpful to people where they need us no matter what the context or form factor.
As you will see today, we are building hardware with the Google assistant at its core so you can get things done without worrying about the underlying technology. Our devices just work for you, whether you are at home, with family, commuting to work, out for a jog or spending time with friends. This is something that Google has always stood for.
Hardware isn’t a new area for Google, but now we are taking steps to showcase the very best of Google across a family of devices designed and built by us. This is a natural step and we are in it for the long run. You are going to hear much more from our team in the coming months and years to come, and we have lots in store for you today.
So let’s get started, first with phones. Phones are the most important device we own. They rarely leave our side. They are literally most people’s lifeline to the Internet and to each other. So today I am very excited to introduce you to a new phone made by Google. We call it Pixel. For those of you who followed Google for a while, that name might sound familiar to you. For us, it’s always represented the best of hardware and software designed and built by Google.