Here is the full transcript of the entire Google I/O 2017 developer conference keynote where Google CEO Sundar Pichai and the team announced latest products and services the tech giant provides. This event occurred on Wednesday, May 17 at Shoreline Amphitheater in Mountain View, CA.
Speakers at the event:
Sundar Pichai – CEO, Google
Scott Huffman – Vice President, Assistant
Valerie Nygaard – Product manager for Google Assistant
Rishi Chandra – VP of Home Products
Anil Sabharwal – Head of Google Photos
Susan Wojcicki – CEO, YouTube
Sarah Ali – Head of Living Room Products
Barbara MacDonald – Product Manager at YouTube
Dave Burke – VP of Engineering, Android
Stephanie Cuthbertson – Group Product Manager, Android Studio
Sameer Samat – VP of Android and Play
Clay Bavor – VP, Virtual Reality at Google
Sundar Pichai – CEO, Google
Good morning. Welcome to Google I/O.
[Audience: We love you, Sundar!]
I love you guys, too. Can’t believe it’s one year already. It’s a beautiful day. We’re being joined by over 7,000 people, and we are live-streaming this, as always, to over 400 events in 85 countries.
Last year was the tenth year since Google I/O started, and so we moved it closer to home at Shoreline, back where it all began. It seems to have gone well. I checked the Wikipedia entry from last year. There were some mentions of sunburn. So we have plenty of sunscreen all around. It’s on us. Use it liberally.
It’s been a very busy year since last year. No different from my 13 years at Google. That’s because we’ve been focused ever more on our core mission of organizing the world’s information. And we are doing it for everyone, and we approach it by applying deep computer science and technical insights to solve problems at scale.
That approach has served us very, very well. This is what has allowed us to scale up seven of our most important products and platforms to over 1 billion monthly active users each. And it’s not just the scale at which these products are working, users engage with them very heavily.
YouTube, not just has over 1 billion users, but every single day, users watch over 1 billion hours of videos on YouTube. Google Maps — every single day, users navigate over 1 billion kilometers with Google Maps. So the scale is inspiring to see and there are other products approaching this scale.
We launched Google Drive five years ago, and today, it is over 800 million monthly active users. And every single week, there are over 3 billion objects uploaded to Google Drive.
Two years ago, at Google I/O, we launched Photos as a way to organize users’ photos using machine learning. And today, we are over 500 million active users. And every single day, users upload 1.2 billion photos to Google. So the scale of these products are amazing. But they are all still working up their way towards Android, which I’m excited, as of this week, we crossed over 2 billion active devices of Android. As you can see, the robot is pretty happy, too, behind me.
So it’s a privilege to serve users at this scale. And this is all because of the growth of mobile and smartphones. But computing is evolving again. We spoke last year about this important shift in computing, from a mobile-first to an AI-first approach.
Mobile made us reimagine every product we were working on. We had to take into account that the user interaction model had fundamentally changed, with multitouch, location, identity, payments, and so on. Similarly, in an AI-first world, we are rethinking all our products and applying machine learning and AI to solve user problems. And we are doing this across every one of our products. So today, if you use Google Search, we rank differently using machine learning.
Or if you’re using Google Maps, street view automatically recognizes restaurant signs, street signs, using machine learning. Duo with video calling uses machine learning for low-bandwidth situations. And Smart Reply in Allo last year had great reception. And so today, we are excited that we are rolling out Smart Reply to over 1 billion users of Gmail. It works really well. Here’s a sample email. If you get an email like this, the machine learning systems learn to be conversational, and it can reply: “I’m fine with Saturday”, or whatever. So it’s really nice to see.
Just like with every platform shift, how users interact with computing changes. Mobile brought multitouch. We evolved beyond keyboard and mouse. Similarly, we now have voice and vision as two new important modalities for computing. Humans are interacting with computing in more natural and immersive ways.
Let’s start with voice. We’ve been using voice as an input across many of our products. That’s because computers are getting much better at understanding speech. We have had significant breakthroughs. But the pace even since last year has been pretty amazing to see. Our word error rate continues to improve even in very noisy environments. This is why, if you speak to Google on your phone or Google Home, we can pick up your voice accurately, even in noisy environments.
When we were shipping Google Home, we had originally planned to include eight microphones so that we could accurately locate the source of — where the user was speaking from. But thanks to deep learning, we use a technique called neural beam forming, we were able to ship it with just two microphones, and achieve the same quality.
Deep learning is what allowed us about two weeks ago to announce support for multiple users in Google Home, so that we can recognize up to six people in your house and personalize the experience for each and every one. So voice is becoming an important modality in our products.
The same thing is happening with vision. Similar to speech, we are seeing great improvements in computer vision. So when we look at a picture like this, we are able to understand the attributes behind the picture. We realize it’s your boy in a birthday party, there was cake and family involved, and your boy was happy. So we can understand all that better now.
And our computer vision systems now, for the task of image recognition, are even better than humans. So it’s an astounding progress. And we are using it across our products. So if you use the Google Pixel, it has the best-in-class camera, and we do a lot of work with computer vision. You can take a low-light picture like this, which is noisy, and we automatically make it much clearer for you.
Or coming very soon, if you take a picture of your daughter at a baseball game and there is something obstructing it, we can do the hard work, remove the obstruction and have the picture of what matters to you in front of you. We are clearly at an inflection point with vision, and so today, we are announcing a new initiative called Google Lens.
Google Lens is a set of vision-based computing capabilities that can understand what you’re looking at and help you take action based on that information. We’ll ship it first in Google Assistant and Photos, and it’ll come to other products.
So how does it work? So for example, if you run into something and you want to know what it is, say a flower, you can invoke Google Lens from your Assistant, point your phone at it and we can tell you what flower it is. It’s great for someone like me with allergies.
Or, if you’ve ever been at a friend’s place and you’ve crawled under a desk just to get the username and password from a Wi-Fi router, you can point your phone at it and we can automatically do the hard work for you.
Or, if you’re walking in a street downtown and you see a set of restaurants across you, you can point your phone, because we know where you are, and we have our Knowledge Graph, and we know what you’re looking at, we can give you the right information in a meaningful way.
As you can see, we are beginning to understand images and videos. All of Google was built because we started understanding text and web pages. So the fact that computers can understand images and videos has profound implications for our core mission.
When we started working on Search, we wanted to do it at scale. This is why we rethought our computational architecture. We designed our data centers from the ground up, and we put a lot of effort in them. Now that we are evolving for this machine learning and AI world, we are rethinking our computational architecture again. We are building what we think of as AI first data centers. This is why last year we launched the Tensor Processing Units(TPUs). They’re custom hardware for machine learning. They were about 15 to 30 times faster and 30 to 80 times more power efficient than CPUs and GPUs at that time. We use TPUs across all our products. Every time you do a search, every time you speak to Google — in fact, TPUs are what powered AlphaGo in its historic match against Lee Sedol.
As you know, machine learning has two components: Training. That is how we build the neural net. Training is very computationally intensive, and inference is what we do at real time so that when you show it a picture, we recognize whether it’s a dog or cat and so on. Last year’s TPUs were optimized for inference. Training is computationally very intensive. To give you a sense, each one of our machine translation models takes a training of over 3 billion words for a week on about 100 GPUs. So we’ve been working hard, and I’m really excited to announce our next generation of TPUs: Cloud TPUs, which are optimized for both training and inference.
What you see behind me is one Cloud TPU board. It has four chips in it, and each board is capable of 180 trillion floating point operations per second. And, you know, we have designed it for a data center so you can easily stack them. You can put 64 of these into one big super computer. We call these TPU pods, and each pod is capable of 11.5 petaflops. It is an important advance in technical infrastructure for the AI era. The reason we named it Cloud TPU is because we’re bringing it through the Google Cloud platform. So Cloud TPUs are coming to Google Compute engine as of today.