Gary Marcus – TRANSCRIPT
One of the biggest reasons I work in artificial intelligence is because I think it genuinely has the potential to change the world. I think there are a lot of problems that we scientists can’t solve on our own, that our brains basically aren’t big enough to handle the complexity of.
So curing cancer, understanding how the brain works, reducing energy consumption, curing mental illness. These are all really complex problems. To take just one example, this is a diagram of all the genes involved in one tiny part of the brain in Alzheimer’s. It’s an incredibly complex network and we don’t understand, as individuals, how all of these things relate to one another. We would like computers to be able to help.
The trouble is we’re not making as much progress in artificial intelligence, I think, as the world seems to think. You read headlines today and they’re all about “deep learning” “Scientists See Promise In Deep-Learning Programs,” “‘Deep Learning’ Will Soon Give Us Super Smart Robots.” That’s what most people think. I’m actually not so sure.
Intelligence, it’s important to remember, is not homogeneous. There are lots of things that go into intelligence. There’s perception, common sense, planning, analogy, language, reasoning. These are all part of what we’d call “intelligence,” and many more things. If you know Howard Gardner’s notion of multiple intelligences, I think it’s fundamentally right.
There are lots of things that go into intelligence. We’ve made enormous progress in AI, but really just in one piece of that which is in perception. And even in perception, we haven’t got it all figured out yet. Here’s something machines can do very well: they can identify a person. You train them on a lot of data about some celebrities, and sure enough, it identifies that this is Tiger Woods. Once in a while, it might get confused and think it’s a golf ball, probably it will never tell you that this is Angelina Jolie.
The way that we do this nowadays, is with big data. We derive statistical approximations to the world from that big data. The most common technique for this now is called a convolutional neural network, which was invented by my NYU colleague, Yann LeCun. The idea is you have a series of inputs into the system with labels on them.
So this is a robot, you get told it’s a robot. The system either gets that correct or wrong, if it gets it wrong, you adjust the stuff in-between. The stuff in-between is a set of nodes which are modeled on neurons, very loosely modeled on neurons, and you’ll see layers there going from left to right, and the idea is you start by detecting low-level things about the image, like differences between light and dark, and you move up some hierarchy to things like lines, circles, and curvy parts, until at the top of the hierarchy, you have things like Tiger Woods or Oprah Winfrey, or what have you.
As I say, it works perfectly fine for simple categorization. But it doesn’t work the minute the problem gets a little bit harder. Suppose you see an image like this. You might be able to get your neural network – as these things are called – to recognize the barbell. That wouldn’t be that hard. If you’re lucky, it would recognize the dog. It might not because the ears are in a configuration that dog ears almost never appear.
Sort of straight out, right? That might actually stump the machine. But whether or not that stumped the machine, I’m pretty sure that your neural net would not be able to tell you that this is an unusual scene; that you don’t see a dog doing a bench-press everyday, and this is something out of the ordinary.
Here’s another example: This ran in the New York Times when this came out. This was a paper that said, “Hey, wow! These deep learning systems, they can caption images now!” If they really could do that perfectly, I would be really impressed. But what we’ve got now is certain cases that work really well, and others, not so well.
Here’s a case that works really well: A person riding a motorcycle on a dirt road. You show the computer this image and it gives you the right answer. Here on the right, it gets it right too: Group of young people playing a game of Frisbee. If you just look at these examples, you’d say, “We’ve solved the problem. The machine understands what’s going on.”
Then you show it this one. I would say the correct answer is maybe a parking sign with stickers on it, but you could describe it differently. None of those would be what the machine gives you, which is refrigerator filled with lots of food and drinks. Makes me think of the Oliver Sack’s book, “The Man Who Mistook His Wife for a Hat.” If my child did this, I would think there was neurological problems.
I would rush them to the doctor. The system doesn’t really understand what a parking sign is, what a sticker is, what a refrigerator is, what drinks are, so it’s looking for the nearest thing in its database, which is this melange of colors, but that’s not really understanding.
I’m about to show you a video that the IEEE spectrum put together last year after the DARPA competition. DARPA was trying to help people build “emergency robots,” and people did all kinds of work in their lab to build robots that could do things like open doors in case of emergency. People were thinking of the Fukushima event, so you want to send robots in where you can’t send people.