AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference (Transcript)

Read the full transcript of Professor of Computer Science at Princeton University Arvind Narayanan’s talk titled “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference” on April 17, 2025.

The presentation was followed by a discussion with Daron Acemoglu, MIT Institute Professor and Co-Director of the Shaping the Future of Work Initiative, along with audience Q&A.

Opening Remarks

ASU OZDAGLAR: Maybe we should get started, right? Hi, everyone. It’s a pleasure to welcome you all to tonight’s talk with Professor Arvind Narayanan. The Schwarzman College of Computing is honored to co-host this event with MIT’s Shaping the Future of Work initiative. We’re excited to have this unique convergence of minds and missions at the intersection of technology, society, and future of work.

We’re honored to be joined by Professor Arvind Narayanan from Princeton, also the co-author of the book AI Snake Oil. It’s such a critical time when there’s so much debate and discussion around the promise and parallel of AI with many people focusing on existential risk, Arvind and Sayash’s book brings a breath of fresh air and provides a balanced perspective on how we can navigate the hype and reality of AI. I personally recommend this book to everyone.

Arvind in the book draws a parallel, a very effective parallel with snake oil, whose sellers promise miracle cures with false pretenses, sometimes ineffective but harmless, but in other cases, harms extending to loss of health or life, very similar to AI. AI snake oil is AI that does not and cannot work. And the goal of the book is to identify AI snake oil and to distinguish it from places where AI can work very effectively, especially in high stakes settings such as hiring, health care and justice.

I’m thrilled to represent the Schwarzman College of Computing as the deputy dean of academics. And our dean, Dan Huttenlaacker, is also here with us tonight. And it’s truly a pleasure to be here with the dynamic leaders of the Shaping the Future of Work initiative, Daron Acemoglu and David Otter. Simon is not here.

Shaping the future of work brings an evidence-based lens to economic and policy impacts of automation. And the Schwarzman College is reimagining how we do research and teach computing with social implications at our core. What unites these efforts and why we’re so excited to have Arvind here tonight is a shared commitment to clarity, rigor, and technical expertise in how AI technology is developed and deployed.

Tonight’s presentation and conversation promises to enlighten us, make us think about these important issues. And with that, please join me in welcoming Professor Daron Acemoglu from the Department of Economics, Institute Professor and Faculty Co-Director of Shaping the Future of Work Initiative.

DARON ACEMOGLU: Thank you very much. Thank you, Asu, and thank you for everybody for being here. This is a great event, and I’m delighted that people have recognized it as a great event and are here.

I want to say just two more words about the initiative for shaping the future of work, which is co-led by myself, David Otter and Simon Johnson, who unfortunately couldn’t be here. And part of the reason why I want to say that is because I want to emphasize how synergistic Arvind’s agenda is to what we want to do. We’ve launched this initiative because we’re worried about the future of work, the future of inequality, the future of productivity in the age of digital technologies and AI.

And part of the reason we are concerned is precisely about how AI and other technologies are going to be used. And the perspective, as the word shaping suggests, is one in which we argue that the future of these technologies is not given, is not preordained, but different technologies have different consequences and we want to understand those consequences and we want to steer technology by a variety of channels, mostly coming from the academic research we’re doing and our collaborators are doing and our affiliates are doing, towards the more socially beneficial directions.

And I think I cannot imagine somebody better than Arvind to actually give much greater depth and breadth to this, because Arvind is a professor of computer science at Princeton and the director of the Center for Information Technology and Policy is bringing, even without the book, a unique perspective, great technical expertise, but a very clear-eyed and deep understanding of many applications of AI.

And that is exactly the space where we need to be, not excessive optimism, not excessive pessimism, but understand what are the things that AI can do productively, what are the things it cannot do at the moment, perhaps never, and what are the things that it can do but are not going to be great. So Arvind’s book, AI Snake Oil, which you’re going hear about, is full of amazing insights ranging from predictive AI to generative AI, large language models, to social media, to machine learning and the mistakes you can make with machine learning. I think we’re going to get a glimpse of many of these excellent points and hopefully a lot of food for thought for everybody.

Arvind is going to speak for twenty, twenty-five minutes, and then we’re going to have a little bit of a conversation for fifteen minutes or so, and then we’re going to open it up for Q and A. So please give a warm welcome to Arvind, and we’re really delighted to have him here.

Presentation

ARVIND NARAYANAN: Hello, everybody. Thank you, Daron and Asu, for such kind words. It’s really my pleasure to be here today. And I really mean it because the origin story of this book is actually right here at MIT. So let me tell you how that happened.

This was way back in 2019 when I kept seeing hiring automation software. And the pitch of these AI companies to HR departments was, look, you’re getting hundreds of applications, maybe a thousand for each open position. You can’t possibly manually review all of them.

So use our AI software and ask your candidates to record a video of themselves speaking for like thirty seconds, not even about their job qualifications, but about their hobbies or whatever. And this is from the promotional materials of an actual company.

And the pitch was that our AI will analyze that video and look at the body language, speech patterns, things like that in order to be able to figure out their personality and their suitability for your particular job. And you can see here, the software has characterized this person on multiple dimensions of personality that’s only one of five tabs. And on the top right, they have been characterized as a change agent and their score is 8.98, two digits of precision. That’s how you know it’s AI. That’s how you know it’s accurate.

And it doesn’t seem to me that there is any known way by which this could possibly work. And sure enough, now six years later, none of these companies have released a shred of evidence that this can actually predict someone’s job performance. And in the few instances that journalists have been able to use creative methods to try to see if these techniques work or not, here’s the kind of thing that they have found.

So here was an investigative journalist who uploaded copies of a video. And in one case, they digitally added a bookshelf in the background. And they also tried changing in another experiment glasses versus no glasses. Radically different scores. Right? So I didn’t have this evidence back then, but this is what I suspected. And coincidentally, at that time, I was invited to give a talk here.

And I gave a talk called How to Recognize AI Snake Oil. And I said, Look, there are many kinds of AI, some things like generative AI, which wasn’t called generative AI back then. Those are making rapid progress. They work well, but there are also claims being made like this. I called it an elaborate random number generator.

And people seemed to like that talk, so I put the slides online the next day. I thought 20 of my colleagues would look at it. But in fact, the slides went viral, which I didn’t know was a thing that could happen with academic work. And I realized it wasn’t because I had said something profound, but because we suspect that a lot of the AI related claims being made are not necessarily true. But these are being made by trillion dollar companies and supposed geniuses, so we don’t feel like we necessarily have the confidence to call it out.

And so when I was able to say, look, I’m a computer science professor, I study AI, I build AI. And I can tell you that some of these claims aren’t backed by evidence that seem to resonate with a lot of people. And within a couple of days, I had like 30 or 40 invitations to turn that talk into an article or even a book. I really wanted to write that book, but I didn’t feel ready because I knew that there was a lot of research to be done in presenting a more rigorous framework to understand when AI works and when it doesn’t. And so that’s when Sayash Kapoor joined me as a graduate student.

So we did about five years of research. And the book is a summary and a synthesis of that research, some of which we’ve also published in the form of a series of papers leading up to that. So let me just take the next fifteen minutes or so to give you some of the main ideas from the book.

The starting point of the book is to recognize that AI is not one single technology. It’s an umbrella term for a set of technologies that are only loosely related to each other. This is ChatGPT. I don’t need to tell you what it is. But on the other hand, technology that banks might use in order to classify someone’s credit risk is also called AI. There’s a reason they’re both called AI. They’re both forms of learning from data.

But in all the ways that matter in how the technology works, what the application is, and most importantly, how it might fail and what the consequences are, these two things couldn’t be more different from each other. So the thing on the right is an example of what we call predictive AI. And what all predictive AI applications have in common is a certain logic for decision making. It’s ways of making decisions about people, often very consequential decisions, based on a prediction of what they will do in the future or what will happen to them in the future. And that decision is made using machine learning based on data from past similar people.

So this is used in hiring, and there the logic is who will do well at a job. It’s used in lending. There the logic is who might pay back a loan or not. It’s used in criminal justice. And there the logic is who might commit a crime or not commit a crime. It’s used in health care. It’s used in education, ever expanding set of domains. And predictive AI is something we’re very dubious about, and I’ll come back to that in a second.

And then, of course, there’s generative AI. In addition to generating text, there’s an ever expanding variety of things that it can do. We also talk a lot in the book about social media algorithms and what are some of the societal scale risks that can arise out of that as opposed to discrete risks to particular individuals. And we talk a little bit about self driving cars and robotics, which I will come back to in a few minutes.

So why are we so skeptical about predictive AI? If we look at the criminal justice example, for instance, in the majority of jurisdictions today, when someone is arrested, when the judge faces a decision, there’s months or maybe years until their trial, Should that person spend that time in jail or should they be free to go or should there be an ankle monitor or any number of options for release? That decision is made or guided by an algorithmic system and automated decision making system or at least decision recommendation system into the statistical learning system, you could call it AI.

It’s something that falls under the umbrella of what we call predictive AI. And the problems with this have been known for a long time. In 2016, there was this well known investigation by ProPublica called Machine Bias, where they did a Freedom of Information Act request. These companies are notoriously secretive. They managed to get a lot of data and they showed that the false positive rates for the particular algorithm that they studied was twice as high for black defendants as it was for white defendants.

And so we’ve known about these problems with racial bias. But when I looked at that study, you know, there was one thread in it that I felt didn’t get picked up nearly enough, which is that the predictive accuracy of these methods is not really that high. So if you know about how accuracy is measured in machine learning, AUC is often used area under the curve. And the best numbers that you can get here are less than 70%. And 50% is random guessing.

Right? So we’re making decisions about someone’s freedom based on something that’s only slightly more accurate than the flip of a coin. And the mass majority of people who are predicted to be at high risk did not in fact go on to commit another crime. So we felt that, you know, how is it ethical to use this for anybody, whether it’s a black defendant or white defendant or anyone else?

So that is one of the main points we make in the book and also in papers leading up to that, that it’s hard to predict the future. And it’s not a matter of a limitation of the technology. We just don’t know who’s going to commit a crime in the future. And so we shouldn’t so easily accept this idea of pre-crime of determining someone’s fate based on a prediction of a crime they will commit in the future as opposed to a determination of guilt.

Let’s talk about generative AI.

# AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference

Generative AI: Benefits and Limitations

There’s a lot more to say about predictive AI, but maybe we can save that for the conversation and for the Q and A. Generative AI, of course, in addition to text, it can generate any one of a number of things. And look, there are limitations. There’s a lot of hype and I’ll talk about some of the downsides in a second. But we’re also very clear in the book that generative AI is useful to basically every knowledge worker, anyone who thinks for a living.

And I’m sure we’ll talk about the labor implications. But I also wanted to emphasize for a second that a big aspect of it is that it’s a technology a lot of the time that’s just very fun to use. And I just wanted to keep that in the conversation because that is often easily forgotten when we’re talking about these serious aspects of AI. And in my own personal use of AI, I certainly use it for my research, but quite a bit in my personal life as well. I have two young kids, and I often find myself using AI in ways that really enrich our relationship when I’m spending time with my kids.

ALSO READ: Jennifer Doudna: How CRISPR Lets Us Edit Our DNA at TED Talk (Transcript)

So the other day, for instance, I was teaching my daughter fractions. And it’s hard for a kid to understand the idea of fractions. So I pulled out an AI app on my phone. These days, you can produce a little app that’s created on the spot by AI based on a text description of what you want the app to do. So I asked the AI agent to create an app to visualize fractions and it made this little game.

It’s a little slider and it generates a random fraction and asks the child to try to guess where it goes on the line. And once they guess, it will check that guess and it will divide the line into many parts, visualize what one third looks like, give a score and keep score and keep generating new fractions. So we played with this for like fifteen minutes and it really helped her. And I’ve done this with all kinds of things, generating random clock faces as a way to teach her to tell time, etcetera. And what’s really cool about this is that you make this app once, you play with it.

You know, my child doesn’t have a huge attention span. It’s useful for fifteen minutes and then it’s done. You throw it away. Right? And that’s amazing. You couldn’t have imagined doing this a couple of years ago because it would have taken at a minimum several hours to create an app.

Irresponsible Release Practices and Harmful Consequences

Okay. So with that said, we are also critical in the book about the generative AI industry’s irresponsible release practices. And there are, of course, many harmful consequences of this. And as we say in the book, it’s like everyone in the world has been simultaneously given the equivalent of a free buzzsaw.

There are AI generated books on Amazon by people just trying to make a buck. In some cases, you know, it’s just an annoyance. Maybe you lost 99¢, which is often what these things are sold for because you unknowingly bought an AI generated book. But in some cases, are things like foraging guides for mushrooms generated by AI full of hallucinations. And so those can have life or death consequences.

And there have been in many cases life or death consequences. People developing companionships with their AI bots and that bots encouraging their suicidal tendencies and that sort of thing. And the biggest one in our mind is these AI notification apps, which you’ve probably heard about. It’s been an epidemic in so many countries around the world, especially in high schools. And this is these are apps that can take a picture of a person and create a nude image of that person based on based on the photo that you upload.

And this has affected hundreds of thousands of, obviously, primarily women around the world. And not only AI companies, but also policymakers being so slow to recognize this problem and doing something about it has been a real shame. Since 2019, since long before the latest wave of generative AI advancements, we’ve had evidence this is a problem that’s happening on a massive scale.

The Hidden Labor Behind AI

So when we talk about AI and labor, very clear thing that we need to talk about is the labor that goes into making these large scale generative AI models. Yes, they’re trained on data from the Internet, but they’re also post trained, as it’s called, based on human interaction.

And there is a lot of human annotation work that is necessary to essentially clean the training data, if you will, that goes into making these models. And this work is offshored to developing countries. It’s trauma inducing work because day in and day out, you have to look at videos of beheadings or racist diatribes or whatever and make sure that doesn’t get into the input or output of AI. And the working conditions are so precarious that a lot of these AI companies have often turned to people who don’t have a lot of options in the labor market, like refugees or people in countries experiencing hyperinflation or prison labor and so forth. So something is clearly wrong here, and we need new labor rights for this.

So having been critical of companies, I do want to say that it’s not about putting all of the blame on the companies. There’s a lot of personal agency that all of us need to exercise. And we need to use judgment in knowing when the use of AI is even appropriate, which is separate from the issue of whether it works or not.

When AI Is Inappropriate: The AI Mayor Example

A good example of this comes from the recent election. This example is not in our book because it’s pretty recent. But there was a candidate in Wyoming, Cheyenne, Wyoming, who wanted the mayor to be an AI chatbot. And he said that if he were elected, all the decisions would be made using this chatbot. As far as I can tell, it was only ChatGPT behind the scenes, but he called it VIC for Virtual Integrated Citizen, which certainly sounds more sophisticated.

And, yeah, I learned about this because the Washington Post called me to ask, what are the risks of having an AI mayor? I was very confused by that question. I kind of blurred it out. “What do you mean risks of having an AI mayor? It’s like asking what are the risks of replacing a car with a cardboard cutout of a car. I mean, looks like a car, but the risk is that you don’t have a car anymore.” I regretted it as soon as I said it, it was a little bit snarky, but the post printed it anyway.

So let me explain what I mean. His point was that politics is very messy, inefficient, a lot of fighting, etcetera. Let’s make it more efficient with chatbots. But that completely misses the point. The reason politics is messy is that that’s the forum we’ve chosen for resolving our deepest disputes. And to try to automate that is to miss the very point.

A Framework for Evaluating AI Applications

This is more or less one of the last things I want to say. Going to wrap up in a few minutes. But this is kind of the framework we use in the book for us thinking about how we should look at any particular AI application.

It’s a two dimensional figure. On one dimension you have, how well does it work? Does it work as claimed? Or is it overhyped? Or does it not work at all? And is a kind of snake oil? But on the other dimension, we have the fact that, AI can be harmful because it doesn’t work as claimed in its snake oil or it can actually be harmful because it works well and it works exactly as claimed.

So let me give you examples of each of those kinds of things. So let’s start with the top right here. I mentioned those video interviews. I mentioned criminal risk prediction. Cheating detection is, of course, when professors suspect that students are using AI, they might turn to these cheating detection tools, but they just don’t work, at least as of today. They’re more likely to flag non-native English speakers. And I’ve heard so many horror stories of students being falsely accused. As things stand today, that very much feels like snake oil to me.

But on the bottom right though, are things like mass surveillance using facial recognition. Historically, facial recognition hasn’t worked that well, but now it works really, really well. And that in fact is part of the reason that it’s harmful if it’s used without the right guardrails and civil liberties and so forth. And we talk about content moderation, which we explain in what way it’s overhyped.

But basically, our interest in the book is everything except the bottom left. Those are applications, the simple things, for instance, like autocomplete that kind of fade into the background and really work well. And our goal is to have an intervention so we can equip people to push back against AI that is problematic. You wouldn’t want to read a book that is 300 pages on the virtues of autocomplete.

AI Is Whatever Hasn’t Been Done Yet

And I say that because I think that bottom left corner is very important. There’s more in that corner than we might suspect. And to explain that, let me give you a funny definition of what AI is. And this definition says, AI is whatever hasn’t been done yet.

So what does that mean? What it means is that when a technology is new, when its effects are double edged, when it doesn’t work that well, that’s when we’re more likely to call it AI. When it starts working well, it’s reliable, it kind of fades into the background. We take it for granted. We don’t call it AI anymore. And this has happened over and over with many kinds of automation.

So Roomba and other robotic vacuum cleaners, I mentioned autocomplete, handwriting recognition, speech recognition, which I’m sure many of us use on a daily basis to transcribe. Even spell check was at one point a cutting edge example of AI. So this is the kind of AI we want more of. We want technology that’s reliable, that does one thing, does it well, and kind of fades into the background. So that’s something that we hope that our critical approach can nudge the industry towards.

And our prediction is, and our kind of optimistic prediction about AI is that one day much of what we call AI today will fade into the background, but certainly not all of it. So for instance, not criminal risk scoring. Are intrinsic normative questions that won’t go away with making the technology more reliable. But with self driving cars, although they are in the news today often for the wrong reasons because of accidents and so forth, it is going to be the case that those are solvable engineering problems. There’s already been dramatic progress in solving them. And one day these things are going to be widely used.

They’re going to become part of our physical infrastructure. We’ll take them for granted. And the word car one day will just mean self driving car. Right? And we’re going to need a new name for what we call cars today, like maybe manual car or something.

And there are some downsides there. We have to think about the labor implications, of, you know, gig workers and so forth. But ultimately, it will have been a good thing because there are one million deaths from auto accidents every year. So again, that’s our vision for a positive kind of AI.

Recommendations for Shaping AI

So broadly speaking, in terms of what I think we need to change, you know, in terms of shaping AI for the better, there are many different recommendations in the book, but I would cluster them into three big areas.

One is we need to know which applications are just inherently harmful or overhyped and we probably should not even deploy. And secondly, even when we even when it does make sense to deploy an AI application, there are often so many risks and we need guardrails for those. And the third one is more structural. It’s really about the fact that AI is exacerbating some of the inherent capitalistic inequalities that we see in our society. So how do we limit companies’ power and redistribute AI benefits.

AI as Normal Technology

Let me take one last minute to tell you about paper that we released just a couple of days ago, which is a follow-up to AI Snake Oil. AI Snake Oil looks at what’s going wrong with AI today and how do we fix it. Our new paper, it’s called AI as Normal Technology. And it’s kind of a vision for AI over the next maybe twenty years. It’s taking a longer term look and it’s trying to give a framework for thinking about AI that’s an alternative to the major narratives that we have today.

There are three major narratives about AI today. The first one is that it’s super intelligence that will usher in a utopia. The second one is closely related. It’s a super intelligence, but it will doom us rather than benefit us. And the third one is that we should be very skeptical about AI. It’s just a fad. It’s so overhyped. It’s going to pass very soon.

And our approach in AI snake oil is the middle ground. It doesn’t fit into one of these narratives. But these three narratives are so compelling that we’re often thought of as saying that AI is a fad that’s soon going to pass. That’s not what we’re saying. But especially in the new paper, we’re making that very concrete. We’re giving a fourth alternative way to think about AI. And this is closely modeled on what we know from past technological revolutions like the industrial revolution, like electricity, like the Internet.

We do think AI is going to have transformative effects, but we think they’re going to unfold over a period of many decades as opposed to suddenly, it’s going to have both good and bad effects. We think a lot of the superintelligence and catastrophic risks have been greatly exaggerated. We think that we’re already in a good place to know how to address some of those risks if they do come up. And on the basis of all of this, we have some policy ideas for steering and shaping AI in a more positive direction. So I’ll stop here, and I really look forward to the conversation.

Fireside Chat (Arvind Narayanan and Daron Acemoglu)

DARON ACEMOGLU: Thank you so much. These are fancy chairs. Yes. As long as I don’t fall off them. All right.

That’s fantastic, Arvind. Thank you for giving a very, very succinct but very effective summary of the book. So I want to start from the predictive AI part. I think that was one of the items in the book that I thought was super interesting and super revealing. But I want to dig a little bit deeper and understand where the more foundational concerns about predictive AI are coming from.

As an economist, perhaps one place one could start is by distinguishing one person decision problems or one person interaction problems versus social problems in which there is an interaction. If you look, for example, build an AI tool for a runner to decide when she’s going to need more liquids or when she’s had enough with the running or something like that. That’s sort of a predictive tool, but it doesn’t have this social interaction aspect. Still has the human agency. So you might say, well, human agency means we can never predict anything.

But I’m not sure whether you want to go there or is it more of these game theoretic situations where, what I do will depend on what others will do. There are these complex interactions that we don’t understand. And I guess if it’s the latter, is there a way to, for example, cut some of the more complex things into smaller pieces and make some progress? Like for example, I don’t see Sendel here, but Sendel’s very interesting work on bail. Could we if that’s a social problem, but could we reduce it with the right guardrails to like a decision problem for a judge?

ALSO READ: Neuralink Update, Summer 2025 – Elon Musk (Transcript)

ARVIND NARAYANAN: Thank you. That’s a fantastic question. And yes, Sandil and others have a great paper, Prediction Policy Problems that lays out their vision for how to use what we call predictive AI for various things such as bail and other things.

I was mentioning that our work is based on some papers we’ve written. The main one on this particular question is called Against Predictive Optimization, which is sort of intended as a counterpoint specifically to the Prediction Policy Problems paper. And to your point exactly that if we were using AI to predict for a runner, when they might need fluids or whatever, certainly doesn’t raise these concerns at all. Yes, it is something about the social nature of it. Specifically, it’s about the fact that an entity with power is exercising that power over an individual, right?

And there, I think we need to go beyond concerns of accuracy and economic efficiency and so forth and ask from a philosophical perspective, when is this exercise of power justified? So in our paper, we actually did engage with philosophers a little bit, and we read that literature and tried to connect it with some of these more concrete AI questions. So while we do say that the accuracy is very poor, how can we do this when it’s only slightly better than the flip of a coin, even if the accuracy was much better at a fundamental normative level, we do think, again, don’t want to go into too much detail for reasons we talk about in the paper, when it’s this when the relationship is that of exercising power over someone, there are more considerations that come into play.

DARON ACEMOGLU: I think that’s a very important point that many of these things and I think the statement that technology is never neutral, that’s like now part of the folklore. But it’s not that it’s neutral. New technologies really change the power balance, especially with large corporations, which is, I guess, is a good segue to my second question, which is generative AI.

I thought the generative AI discussion was very interesting as well. So sort of one take, which I think is sort of close to my view is, there are a lot of very exciting capabilities of generative AI, but there aren’t that many applications that and I think I don’t see pretty much any applications except in a very few areas like programming subroutines, etcetera, which is really going to change yet the production structure. I think that’s not exactly the way you put it, but I think it’s similar to so is there a fundamental reason for that? Or is this just a fasting phase?

ARVIND NARAYANAN: Yes. So I completely agree with you that that’s the state of things right now. Where we might disagree maybe a little bit is that I’m not so sure it’s fundamental. I think it can change in the future, and it’s already changing now. And let me explain what I mean.

This is a big part of what we get into in the AI as Normal Technology paper. When ChatGPT came out, the fact that it was so general purpose that you could make it do different tasks simply by changing the prompt really misled, I think, not just a lot of users, but also the companies themselves from talking to many developers in the AI industry into thinking that this was a new paradigm of software development, that this had obviated the need for building software to do specific things for you, software in the legal sector or software for helping you with your writing or whatever it is, then you can use these general purpose models. And going forward, all that was going to be needed is prompting. And that approach, it was tried for a year or two and has miserably failed.

We analyze in our newsletter, for instance, we have a AI Snake Oil newsletter where we use the foundational approach in the book to analyze ongoing developments. Why many products that were simple wrappers around large language models and tried to actually get them to do useful things in the real world instead of simply spitting out text, those have been pretty bad failures.

So there was a device called the Rabbit. Do folks remember this? And then there was a Marcus Brownlee review saying it was the worst thing it ever reviewed, and there was a big little bit of a scandal about that and so forth. So that is exactly an example of what you pointed out, which is that the capability is there.

These large language model based agents are capable of doing very interesting things like navigating a website, doing shopping for you. But the thing is because they haven’t developed products around them and gotten the reliability rate up from, let’s say, like 80% where it is now, all the way to 99.99%, which is something we expect any software product to have, these are pretty much dead on arrival. People are complaining that it ordered their products the wrong address, right? Would you use a product a second time if it did that? So that’s an example of where companies have dropped the ball in translating those capabilities into products.

They are changing their approach now. I think there is a very good chance that we’re going to see a mushrooming of products in the few years coming.

DARON ACEMOGLU: No, I definitely didn’t mean to suggest that it was fundamental, but I think it’s I’m also not convinced it’s going to go away very quickly. I guess one reason, which is different from the ones that you’ve articulated, so let me try that on you is, every job is a very complex bundle of tasks. And the way that we’ve done automation in the past is that we’ve done careful or semi careful division of labor so that certain tasks can be separated.

Other complementary tasks can be performed by humans or organizations can be changed. It requires a lot of specific knowledge for the occupation, for the industry, a lot of tacit knowledge. And I think the approach of the leading AI companies has been, well, we’re going to go into AGI or very close to AGI, everything can be done. So we don’t need any of this tacit knowledge. So we’re just going to throw these foundation models and they’re going do it.

And I think that’s never going to work because even with the very fast advances of foundation model, a foundation model is not going to be able to do everything that even like an auditor does. And when you go to an educator or to a health professional, I think it’s very unlikely. So you really need this tacit, very specific domain knowledge. So I don’t see that path being followed yet.

ARVIND NARAYANAN: 100% agree. I think this is another area where AI developers really fooled themselves. I think there was misleading intuition from the last few years of rapid AI progress, whereby scaling up these models and by training them on bigger and bigger chunks of the Internet, there were more and more emergent capabilities. That approach has run out. And not only because they’re already training on all of the data they can get their hands on, but also because the new things that are left for these models to learn are exactly things like tacit knowledge.

There is a way to learn tacit knowledge, but it is not in the passive way that models are being trained right now. It is by actually deploying models or AI systems, even relatively unreliable AI systems in small settings and different domains on a sector by sector basis, not in a general purpose way, and learning from those interactions with real users and real domain experts. This is the kind of positive feedback loop that we’ve had, for instance, with self driving cars.

The reason that it took about two decades from the first demonstrations of successful self driving to getting them to a place where they’re able to autonomously ferry people on the roads right now is that you have to slowly scale up. You drive 1,000 miles, and then you collect data that allows you to improve your reliability. Now it’s reliable enough to drive 10,000 miles and then you go to 100,000, etcetera.

That’s a very slow process. We predict that we’re going to see that kind of slow feedback loop going forward on a sector by sector basis.

DARON ACEMOGLU: Okay. Well, I think that’s a segue for the next question, which I wasn’t sure whether we’re going to ask because you wisely stay away from AGI a lot in the book. But I mean, I guess here is the sort of the argument that many people have in their minds, which makes something like AGI a default position.

At the end, the human mind is a computer, whatever substrates it uses, it’s the computing machine of sorts. Well, we’re going to build better and better computing machines that therefore will go to AGI. So I think then any I think this is a sort of a bait and switch, then it really puts anybody that says, well, show me the money in a defensive position. But if we were in that defensive position, either we would have to disagree with that scenario or we would have to say, well, here are the bottlenecks that you haven’t taken into account. And I would be curious to know whether you would completely avoid being put in that position or you have something to say about the presumption or the bottleneck.

ARVIND NARAYANAN: Yes. No, I’m more than happy to talk about it. These are certainly some of our more controversial views, at least within the tech community. Not here. So let me say two things to this.

One, this has been consistently predicted throughout the history of AI for more than seven years. When they first made these what are called universal computers we just call them computers now but back then, they were called universal computers because it was a very new concept that you could build one piece of hardware to do any task by programming it appropriately as opposed to building special purpose hardware for each particular task. The excitement around that was exactly similar to the excitement around general purpose generative AI models today. They thought, we’ve done the hard part, the hardware. It’s right there in the name.

And now we just need to build a software to emulate the human mind. And they thought that, that was only a couple of years away. The pioneering nineteen fifty six Dartmouth Conference, they proposed a two month, ten man effort to make very substantial progress towards AGI, which was just called AI back then. So over and over, while it might be possible in principle that we can have software do all the things the human mind does, AI developers have been so off in knowing how much the gap is between where things currently are and where we need to be.

The second thing, I know we’re running out of time, so let me just very quickly say is, we talk about this in the book in Chapter five. In our view, human intelligence is not primarily a consequence of our biology, but rather our technology. The fact that we’ve been using our technology for decades, for centuries to learn more about the world. The most prized knowledge that we have that allows us to do things that we most associate with intelligence, whether it’s medical testing or economic policy, these are things that we learned by doing large scale experiments on people.

So we predict that very soon these computational limits are not going to matter. But the thing that is going to hold AGI back is not being able to easily transcend learning this knowledge from what humans have actually learned already and creating new knowledge for itself because that’s going to require the same kind of bottlenecks that we ourselves have faced with experiments, scaling, ethics and so forth. And we’re not going to let AI do experiments on millions of us without any oversight. And so that is going to put very, very strong speed limits.

DARON ACEMOGLU: Great. That’s an excellent point.

I think we’re running out of time. So I want to bring it to, I think, to a topic that’s actually much closer to our initiative, which is our concerns about whether we’re doing the right kind of AI. So I think David, Simon and I, all three of us have these ideas, some based on intuition, some based on empirical facts, some based on history that there is a more productive way in which you can develop AI, especially what we call pro worker AI, which aims at increasing worker skills, expertise, productivity, create new tasks or capabilities to do more sophisticated tasks and that we’re worried whether we’ll actually ever get there on the current path. And I guess you have sort of very nicely cataloged various mistakes that people are making in terms of banking or at least pretending that they’re banking on AI that’s unlikely to work or if it works, it’s not going to be that great. AI hype, perhaps that’s leading to AI over investment, perhaps it’s leading to the wrong kind of AI investment.

But I guess, at least I didn’t see it or perhaps I missed it, that next step in the book, which is therefore, the wrong types of innovation effort, R and D, etcetera, is being made, the wrong kind of startup energy is coming and whether we can do anything about that other than, of course, inform the public, inform policymakers with books and conversations like this. But is there a more sort of an agenda of that sort that would make you even more of a familiar with us?

ARVIND NARAYANAN: I think there is a little bit, and I think that’s exactly the critical question. And I’m so glad that you all are looking into that. And I think it needs perhaps economists more than computer scientists.

But what I can say from a computer science perspective is that when we look at what companies are doing, there is just right now, the market is not rewarding that. So an example of where this plays out is the use of AI by students. I know we’re talking about labor, but I think it’s somewhat similar. So the initial uses of generative AI were primarily for cheating or other things that are maybe not the best ways to use them. We know there are good ways to use them.

We know that AI, if properly configured, can be a very good tutor despite its limitations such as hallucinations. I use AI very heavily for learning. I haven’t stopped using books, but there are some advantages of using AI. Happy to get into that in the Q and A.

ALSO READ: Apple’s iPad Air 2 Keynote - October 2014 Media Event (Full Transcript)

But it’s remarkable. They’re only, I think, a couple of weeks or maybe a month ago that Anthropic came up with an AI tutor, which is just a simple customization of their model to be in a tutoring mode where it’s not just giving out answers to the student, but rather promoting their critical thinking. And it’s striking to me that that took them so little work, but it took them two and a half years or whatever of people just constantly complaining in order to do.

And so yes, we can provide lots of technical ideas. But ultimately, we need to either change the incentives for companies through regulation or have much more investment in other organizations, maybe NGOs who are going to develop these AI applications with the public interest in mind instead of leaving it to the AI companies.

DARON ACEMOGLU: Thank you very much, Arvind.

I think that’s a great time for us to transition because I’m sure many people have burning questions for you. The way we’re going to organize this is there are two mics over there. Those of you who want to ask questions, if you don’t mind lining up, and then we can take one from each side in alternating order. Why don’t we start on the right?

Audience Q&A

AUDIENCE MEMBER 1: Thank you very much. A wonderful talk. I’ll just ask a quick layman question as a user for AI for over a year. My question is that how much you can tolerate the error they generate from AI, giving you the answer. For instance, recently, I tried to ask AI a question of a citation or quotation from a famous person. And then I get an answer, and then I post it on my blog, and I get embarrassing because I asked the professor. He said that I never say such statement. So that AI to create or imagine that person x say something like this, I would say that AI give you a lot of convenience, pattern recognition and very convenient, but I would say 10% of or more, they give you some a mix with your correct answer. Thank you.

ARVIND NARAYANAN: Yes. Hallucinations are a big problem with generative AI. These are fundamentally stochastic technologies. Even if we could somehow clean all the training data and make sure that you only train it on true statements, the problem would remain because at generation time, it is kind of remixing the statistical patterns in its training data. The hallucination rates have gone down quite a bit over the last couple of years, but they’re not zero. I don’t think they’re going to reach zero in a very short period of time.

And I have also had the experience of people emailing me to ask for my papers, and they’re like, Hey, where is it? I couldn’t find it online. It turns out it was made up by AI and attributed to me.

And so what we train people to do is to identify not just think about using AI generally in your work, but identify specific areas of your workflow. And in each of those uses of AI, you have to have an answer to why is it easier to verify the answer to this question than to have done this work myself in the first place?

And if you don’t have an answer to that, don’t use AI. And if you do have an answer, it might save you time or enhance your creativity as the case may be. Thank you.

AUDIENCE MEMBER 2: Thank you. Professor, I wanted to bring back again this topic of the fairness or maybe bias of the algorithms. About six months and a year ago, Sam Altman came. And when someone asked him about the bias of the algorithms, you can say that, for example, in the criminal justice system, he said that it’s easier to change the algorithm than the bias from the humans. And I said like, okay, maybe that’s compelling. To be honest, for me, was like, okay, understandable. What do you say about that? What do you think? Is it possible? Do you think it’s better? Who says what the bias is?

ARVIND NARAYANAN: Yes. Thank you. It’s a good debate. Sandil has also made that claim. He had an op ed in The New York Times literally saying it’s easier to fix biased algorithms than biased humans.

And I very much see that point of view. I have a slightly different view. I think in theory, it’s certainly possible to fix certain kinds of biases in algorithms. It might just be a matter of changing a parameter in the code, and there are many computer science techniques to do that. The problem is not technical.

The problem is one of incentives and transparency and things like that. A lot of the time, you don’t have transparency from the companies who are building these criminal justice algorithms. They might say it’s proprietary. It’s a trade secret. So in the case of COMPASS, even though it’s been about a decade since that investigation, no changes have been made.

Because actually, you cannot fix it without introducing disparate treatments. If you were to fix it in the algorithm, you have to have different weights or different treatments, different thresholds for different categories of people. And that actually would violate the law. And these are things that human judges account for in a very subtle way when they’re making their decisions. But when we’re trying to do them in algorithmic systems, we have to do them in very crude ways, which, even if theoretically possible, end up not being practically possible because of various constraints.

AUDIENCE MEMBER 3: Hi. Thank you for the book and the conversation. So the question I wanted to ask, kind of in the book and in the talk, right, there’s this kind of general statement that it’s like many predictive AI applications are unlikely to work and there’s hope for Gen AI. And I want to ask basically, is that predominantly a statement about, from your perspective, the underlying technologies or the settings in which those technologies are employed?

The reason I ask that is because, you know, there are certain like I work on AI for climate change related applications, There are certain settings like solar power prediction where you could use a predictive model or a generative model. And also there are kind of some tides from some of the large technology providers now where there’s this statement of like, we should invest in generative AI and large models because it’s going to solve climate problems. And that leads to an investment in Gen AI rather than other techniques and all of the sort of energy consumption and concentration of power that comes with that. And sort of the statement of like, Gen AI is better than predictive AI can kind of lead to those kinds of tides. So yes, circling back. But a statement about the technologies or the settings?

ARVIND NARAYANAN: Yes. Thank you. That’s a great question. And it’s exactly the latter. It’s a statement about the settings. Applications that we’re using these technologies for. Even if you were to take a generative AI model and use it in criminal justice, exactly the same list of our objections would apply. And they don’t apply in solar power prediction setting, right? Because you’re not making consequential decisions about people that have massive ethical consequences. So yes, I’m totally with where you’re coming from with that question.

AUDIENCE MEMBER 4: You said something about the predictions from people in the field about how soon AI would reach certain levels has been has a terrible track record. Let me suggest that’s a sample bias because all the bad predictions get all the press. You never hear about the fact that somebody once asked John McCarthy how long what it would take to get really good artificial intelligence. And he was annoyed by the question, so he gave a somewhat whimsical answer. But he said 1.3 Einsteins and he went on from there. That’s not widely quoted because it’s not nearly the kind of thing you can make a big laugh out of. So let me caution you on that.

ARVIND NARAYANAN: Thank you. I appreciate that. I should clarify, I had a somewhat superficial presentation at that point when I said it just now. But there’s a deeper point behind it, which is not so much that the founders were wrong, but they were wrong for fundamental structural reasons in that they were I’m not blaming them. They were not able to see what are the kind of steps in the ladder, if you will. We use the metaphor of a ladder in our book to discuss progress in AI.

When you’re standing on one step of the ladder, we claim it’s impossible to know what the future steps on the ladder are. So it’s really about that deeper point. But thank you. I appreciate the—

AUDIENCE MEMBER 4: And one other quick point. You got what I’ll call an inexpensive laugh out of the idea that the, predictive programs and their AUC was only about 70, so they weren’t much better than flipping a coin. You have to ask about the baseline. How good were the people doing this task? Because if the people doing this task are at 55%, then 70% is pretty damn good.

ARVIND NARAYANAN: Yes. Okay. All right. So the people were exactly at the same level as the algorithm and not even trained judges, just laypeople. And not only that, whatever you can accomplish with state of the art algorithms, you can get with a two variable regression model. And those two variables are the defendant’s age and the number of prior arrests. And we talk about why the use of the age is actually morally problematic.

So essentially, the logic behind these systems is if you’ve been arrested a lot in the past, you’re going to be arrested a lot in the future. That is the entire thing. And we actually say that we would actually be much happier with a system where that was the hard coded logic because it would be apparent to everybody, especially the defendant, what is actually going on. So I’m with you that we have to look at the right baseline. But here, we have a 40 page paper looking at the right baseline, and it doesn’t look good for the algorithm.

DARON ACEMOGLU: Maybe I’ll go to another question. Yes. Have only less than three minutes left. So please, very short questions at this—

AUDIENCE MEMBER 5: I had a question about how you calibrate investment across different types of technologies. So maybe you have one type of technology that, with some very small probability, will cause a big return and something that is likely to work but will not change the world? So how do you think about allocating investment between these different technologies?

ARVIND NARAYANAN: Yes, definitely. So this relates to an issue that we call hurting in the AI community. And in every research community, there are fashionable ideas, people cluster around them. It’s hard to compare this between fields, but just kind of based on vibes, it seems like there is more of this going on in the AI community than in most other research communities.

So today, all of these fancy generative AI systems are based on neural networks, which were sidelined for more than twenty years because people thought that they were completely outperformed by another technology called support vector machines, which people would laugh at in today’s context, right? So why did that happen? There wasn’t enough diverse exploration of different paths.

So I do think it might be hard to compute the return on investment, but it seems clear that there needs to be more of a risk preference, if you will, and diversifying the set of research ideas we invest into.

Audience Q&A (continued)

AUDIENCE MEMBER: Yes. Since the 1980s, I think we’ve seen an ever increasing gap between productivity and wages. I think there’s a big argument for the stagnation of wages basically just occurring probably due to a number of factors, but including computation kind of replacing and abstracting a lot of the work. How do you view maybe the impacts of something like AI maybe impacting the productivity of a worker and how that might also affect the wages? And how can we correct that?

DARON ACEMOGLU: That’s a question for you, I think. But I don’t want to hear from me. I don’t want to hear from you. No, no, no, please go ahead. I want to hear from you.

ARVIND NARAYANAN: Well, this is what we spent a good chunk of our waking hours on – not just automating work, which will, of course, happen and should happen, but also finding AI uses that will increase the information and the capabilities of workers to deal with more complex things. But how to get there is the real challenge.

AUDIENCE MEMBER: Thank you so much. I just wanted to ask, I think I’ve seen lately in the public, there is a lot of fear and backlash against AI. And I wanted to know your thoughts on what might be contributing to that. And also how people in either tech research or tech industries can address those fears?

ARVIND NARAYANAN: Definitely. Lots of thoughts on this, but I know we’re running out of time. So let me keep it short.

And yes, you’re absolutely right. Many more people, according to public opinion surveys, are worried about what AI will mean for them than are excited about it. And I think this is almost entirely a statement about capitalism than it is about AI. It varies a lot between different countries based on the kinds of worker protections that people have come to expect, etcetera. It dramatically moderates their reaction to these exciting/worrying technological developments.

And I think in terms of what can be done about it, I think there’s a lot of room to improve the channels of communication between the AI industry and research community with the general public. We’ve talked a lot about how, in many ways, people in general, workers in different domains, have a much better understanding of AI’s potential and limits in their particular application, like law or medicine or whatever, than AI developers do. And so AI developers would benefit a lot from understanding that and not making these overhyped claims.

But at the same time, I think people deserve to understand why is it that companies are confident enough to make these trillion dollar bets, understand new emerging capabilities, which, frankly, almost feels like a full time job to kind of stay on top of. I think companies can do a lot to ease actual public understanding as opposed to just hyping up capabilities. So I think communication could improve in both directions.

DARON ACEMOGLU: Thank you so much. Well, thank you very much, Arvind. That was wonderful. I’m just going to add that it’s a testament to how interested people are. They could have stayed here for another half an hour, but thank you.