Why Toddlers are Smarter Than Computers: Gary Marcus at TEDxCERN (Transcript)

Gary Marcus – TRANSCRIPT

One of the biggest reasons I work in artificial intelligence is because I think it genuinely has the potential to change the world. I think there are a lot of problems that we scientists can’t solve on our own, that our brains basically aren’t big enough to handle the complexity of.

So curing cancer, understanding how the brain works, reducing energy consumption, curing mental illness. These are all really complex problems. To take just one example, this is a diagram of all the genes involved in one tiny part of the brain in Alzheimer’s. It’s an incredibly complex network and we don’t understand, as individuals, how all of these things relate to one another. We would like computers to be able to help.

The trouble is we’re not making as much progress in artificial intelligence, I think, as the world seems to think. You read headlines today and they’re all about “deep learning” “Scientists See Promise In Deep-Learning Programs,” “‘Deep Learning’ Will Soon Give Us Super Smart Robots.” That’s what most people think. I’m actually not so sure.

Intelligence, it’s important to remember, is not homogeneous. There are lots of things that go into intelligence. There’s perception, common sense, planning, analogy, language, reasoning. These are all part of what we’d call “intelligence,” and many more things. If you know Howard Gardner’s notion of multiple intelligences, I think it’s fundamentally right.

There are lots of things that go into intelligence. We’ve made enormous progress in AI, but really just in one piece of that which is in perception. And even in perception, we haven’t got it all figured out yet. Here’s something machines can do very well: they can identify a person. You train them on a lot of data about some celebrities, and sure enough, it identifies that this is Tiger Woods. Once in a while, it might get confused and think it’s a golf ball, probably it will never tell you that this is Angelina Jolie.

The way that we do this nowadays, is with big data. We derive statistical approximations to the world from that big data. The most common technique for this now is called a convolutional neural network, which was invented by my NYU colleague, Yann LeCun. The idea is you have a series of inputs into the system with labels on them.

So this is a robot, you get told it’s a robot. The system either gets that correct or wrong, if it gets it wrong, you adjust the stuff in-between. The stuff in-between is a set of nodes which are modeled on neurons, very loosely modeled on neurons, and you’ll see layers there going from left to right, and the idea is you start by detecting low-level things about the image, like differences between light and dark, and you move up some hierarchy to things like lines, circles, and curvy parts, until at the top of the hierarchy, you have things like Tiger Woods or Oprah Winfrey, or what have you.

As I say, it works perfectly fine for simple categorization. But it doesn’t work the minute the problem gets a little bit harder. Suppose you see an image like this. You might be able to get your neural network – as these things are called – to recognize the barbell. That wouldn’t be that hard. If you’re lucky, it would recognize the dog. It might not because the ears are in a configuration that dog ears almost never appear.

Sort of straight out, right? That might actually stump the machine. But whether or not that stumped the machine, I’m pretty sure that your neural net would not be able to tell you that this is an unusual scene; that you don’t see a dog doing a bench-press everyday, and this is something out of the ordinary.

Here’s another example: This ran in the New York Times when this came out. This was a paper that said, “Hey, wow! These deep learning systems, they can caption images now!” If they really could do that perfectly, I would be really impressed. But what we’ve got now is certain cases that work really well, and others, not so well.

ALSO READ: How To Create A Successful Mindset: Maxi Knust (Transcript)

Here’s a case that works really well: A person riding a motorcycle on a dirt road. You show the computer this image and it gives you the right answer. Here on the right, it gets it right too: Group of young people playing a game of Frisbee. If you just look at these examples, you’d say, “We’ve solved the problem. The machine understands what’s going on.”

Then you show it this one. I would say the correct answer is maybe a parking sign with stickers on it, but you could describe it differently. None of those would be what the machine gives you, which is refrigerator filled with lots of food and drinks. Makes me think of the Oliver Sack’s book, “The Man Who Mistook His Wife for a Hat.” If my child did this, I would think there was neurological problems.

I would rush them to the doctor. The system doesn’t really understand what a parking sign is, what a sticker is, what a refrigerator is, what drinks are, so it’s looking for the nearest thing in its database, which is this melange of colors, but that’s not really understanding.

I’m about to show you a video that the IEEE spectrum put together last year after the DARPA competition. DARPA was trying to help people build “emergency robots,” and people did all kinds of work in their lab to build robots that could do things like open doors in case of emergency. People were thinking of the Fukushima event, so you want to send robots in where you can’t send people.

All the things I’ll show you were well practiced by the labs that participated in this competition, but as you’ll see, the results left something to be desired. There’s more you can see on YouTube later, but that’s probably enough mocking those particular sets of robots. The broader point that I want to make is that what we’re good at right now as a field, in artificial intelligence, is the stuff on the left: the routine things for which we have big data.

So if you have a lot of data about opening doors in a particular environment, you’re great.
But what if the environment changes? Then you have only little data, the unusual but important things. Or, what I jokingly call, “small data.” Humans are really good at small data, but machines still aren’t very good at it. Part of it is because there’s little depth of understanding, not even common sense. I recently wrote this article with Ernie Davis on “Common Sense Reasoning in AI,” and the people who put together the cover made this great cover that makes the point very nicely that you have the robot here that is sawing a tree limb. One way you could learn about which side of the limb to sit on, when you were sitting with your chainsaw, would be to collect a lot of data.
But, you know, this is not good for people sitting below the chainsaw, and not good for the robot. You don’t want to learn this on the basis of big data, you want to have more abstract principles. Things get worse when you get to scientific reasoning.
Here’s a multiple choice exam, originally drawn from eighth grader questions, made by Paul Allen’s Allen AI Institute. What do earthquakes tell scientists about the history of the planet? One possibility – multiple choice – is, A: Earth’s climate is constantly changing. B: The continents of Earth are continually moving. C: Dinosaurs became extinct about 65 million years ago. Or, D: The oceans are much deeper than millions of years ago.
Well, apparently if you’re a machine, most models that entered the competition said: “C: Dinosaurs became extinct about 65 million years ago.” Why is that? Probably because they’re doing the equivalent of Google search, they’re doing keyword search. And “history of the planet,” “65 million years ago,” “dinosaurs,” and “extinct,” all kind of pop up at once. There’s no real understanding here of what an earthquake is, or what the history of the planet is. Hopefully many of you sitting in CERN realize that the answer is B, but not many of the machines did. As Wired magazine put it, “The Best AI Still Flunks 8th Grade Science.” I’ve already told you my vision is AI systems that could do scientific reasoning on their own, and we’re not there yet.
ALSO READ: When Physics Meets Fiction - Brian Greene & Dan Brown (Transcript)
Here’s something I wrote a few years ago and that I stand by every word of. I wrote this for the New Yorker when deep learning became popular, and was front page news in the New York Times. “Realistically, deep learning is only part of the challenge of building intelligent machines. Such techniques lack ways of representing causal relationships – what did what to whom – and are likely to face challenges in acquiring abstract ideas.” Four years later, there’s much hype about deep learning, and billions worth of investment.
But we haven’t had progress on the causal relationships, the abstract ideas, the logical inferences, and so forth. It reminds me of an old parable. The parable is about building a ladder when you want to get to the moon. Solving science through AI is getting to the moon. Selling more advertisements isn’t, so we can use AI now to tell you what else you might buy.
“If you buy that books, you might like this one,” and that’s great, but if you don’t buy the book, it doesn’t really matter. But it matters when it comes to things like medicine. We want the AI to really do it right. Well, building ladders that are getting us an inch closer or an inch here might not be the right approach. What I think we need to think about is the difference between data and abstract understanding.
These are Boyle’s and Charles’ law that you learned in high school chemistry. The blue dots represent the data. It’s easy for a big data collection machine to organize that data, but what you really want is essentially the lines. You want the idea about: what is the relationship behind this data? So you can interpolate where you haven’t seen things before, extrapolate beyond what you’ve seen before. Which means really, you want your AI systems to do something they haven’t done.
Which is ask the question of “Why?” Not just “How much?” and “When?” and “What is correlated with what?” But “Why are the things in the world related to other things?” I think we have only one model of a creature that asks this question a lot. And that would be the human toddler. This is my daughter, Chloe. She’s two and a half, and she asks me, “Why?” roughly 20 times a day. “Why is it dark now? “Why are you wearing a hat now?” She’s constantly asking “Why” questions.
This is her brother, he’s a little older, – it’s an older picture – he’s four. When he was two, he was studying what I would describe as: “the functional utility of the hole” on the top of a raspberry. He developed the concept, not for the first time in history, of the “fingerberry” as he called it. So here he is with the fingerberry, and maybe a few days after this picture, I was on the road, I was giving a talk, and my wife sent me this text message. She says, to my son Alexander, – he’s two and a half years old – “Which of your animals will come to school today?” And he says, “Big bunny. Bear and platypus are eating.”
So she walks to the next room where his bedroom is, and she sees that he’s created a diorama of bear and platypus and they are, in fact, eating. At this point, he was 100 percent honest in his answers. What does this tell us? Well, for one, he understands complex syntax. In a linguist’s term, this is called a “WH question.” “Which of your animals will come to school today?” If you’ve worked with Siri, you know that syntax is still a challenge sometimes for computers.
ALSO READ: Many Small Pictures Make a Big Picture: Kalyani Priyadarshan (Transcript)
He was able to give novel answers depending on recent updates to the state of the world. Or, instead of memorizing things and finding the most popular answer that had been Googled for before, he was thinking about what had happened right now, what was the current state of the world, and directly reflecting that in his answers. He was doing logical reasoning; if they’re over there, they’re not coming with me. So he’s able to integrate all this, and importantly from the perspective of AI, he didn’t do this with massive data, he did this with modest data.
Two years, basically, of people talking to him. First six months, I don’t think he understood the phonology. So two years of people talking to him, and no direct access to what we call in my trade, “labeled data.” I told you like you have Tiger Woods; a picture of him, and you have the label Picture of a golf ball, and a golf ball.
He doesn’t get that most of the time, and yet he was able to work it all out. So by now, a year and a half later, he’s very flexible. When I was putting together this talk, I showed him this and said, “What’s going on in the picture?” He said, “It’s an elephant carrying an umbrella.” It’s not like in one of his books, there was an elephant with an umbrella, and he had memorized that. He has a perceptual system integrated with his language system, and he puts it all together.
I said, “Is the umbrella the right size for the elephant?” He said, “No, it’s too small.” He can, on the fly, make inferences to things for which he has a very small amount of data. This brings me to my main point. Which is very much inspired by where we are CERN is this vast, inter-disciplinary and multi-country consortium to solve particular scientific problems.
Maybe we need the same thing for AI. Most of the efforts in AI right now are individual companies, or small labs working on small problems, like how to sell more advertising, and things like that. What if we brought people together to try this moonshot of doing better science? And what if we not only brought together machine-learning experts and engineers who can make faster hardware, but researchers who look at cognitive development and cognitive science? I think maybe we could make some progress I’m not saying humans are better than machines at everything, humans aren’t nearly as good as arithmetic. But we are better at asking “Why?” and understanding science.
Maybe we can learn something from human children. So, here’s a way to think about it: We’ve been working on computers for 60 years. We’ve made them much smaller, much faster, more more energy efficient. This watch that I have can do everything the ENIAC could do with an entire room 60 years ago. And yet, we still haven’t understood how to program into a machine the flexibility of human thought. Or the ability of a child, toddler, tiny toddler, to learn something new. Maybe it’s time that we try. Thank you very much.

Why Toddlers are Smarter Than Computers: Gary Marcus at TEDxCERN (Transcript)

Related Posts