ARC 2026: Anthropic's Chloe Lubinski on AI (Transcript)

The following is the full transcript of Anthropic’s Chloe Lubinski’s talk on AI at Alliance for Responsible Citizenship 2026.

Listen to the audio version here:

Understanding AI: Essentials From Anthropic’s Research Partnerships

CHLOE LUBINSKI: I work at Anthropic, where I lead the research partnerships with the world’s wisdom traditions. And my job really has two parts to it. So the first, it’s to help these experts in these various fields and disciplines actually understand AI, what it is, what’s happening now, and where it’s going. And the second part is to listen and to learn and to funnel wisdom back into the organization, back to the people that are building this technology.

Just last week, I was walking my little red-haired cocker spaniel in San Francisco, thinking of what might be most helpful to you today in having these conversations. And the thing is, I’ve probably had hundreds of conversations now across 20 or so traditions and disciplines, and I’ve found, again and again, just how important it is for folks to really understand the basics before we can even start to talk about how this can go well. So my hope today in this short time is to give you some of those essentials as quickly as I can. So I’m going to jump right in.

This Technology Is Real and Moving Fast

The first thing that I really want to tell you is my goodness, to know that this technology is real and that it’s coming faster than you think, and the force behind it is enormous. Now you may or may not have heard of the scaling laws, which is really what kicked off this whole race to begin with, and you really don’t need to understand anything about this graph other than this, which is these models get predictably better with more compute, and the more energy, the more data, the more training that goes into them, and they get smarter, and they get smarter about everything.

And so with more money, which buys compute, you can essentially purchase intelligence, and that’s kicked off a cycle that is very hard to stop. A better model does more economically valuable work, which attracts more capital, which buys more compute, which trains a better model, and around and around it goes.

Recursive Self-Improvement and What “More Capable” Really Means

And now there’s a further turn of the wheel. These systems are starting to build their own successors, what researchers call recursive self-improvement, or helping to build. But when Claude 8 can build Claude 9, which can build Claude 10, things will begin to move even more quickly.

And just to be concrete about what more capable actually means, our most capable model, in its first month of only limited release, found over 10,000 serious security vulnerabilities across partner software, flaws that human experts had missed for years, and sometimes decades. Now the same trajectory there also holds in biology, which is why we have entire teams at Anthropic dedicated to safeguarding against it. And we also think other domains will soon follow.

The Case for Slowing Down — and Why It’s So Difficult

So Anthropic stated just a few weeks ago that if it were possible to slow down so that our laws and our institutions and guardrails that we actually need have time to catch up, it would be a very good thing. But absent a coordinated global slowdown, what we’re left with is this extraordinary technology built at breakneck speed by many actors in many countries locked in a competition where commercial and geopolitical rivalry is drowning out the part of this that could actually be most consequential and even existential for our species. And any individual company stepping off the wheel doesn’t slow the wheel, it just means that you’re not on the wheel.

ALSO READ: Michael Huemer: The Irrationality of Politics at TEDxMileHighSalon (Full Transcript)

So the question that I encourage you to sit with over these next few days is not just how to stop this, maybe you’re not asking that, but the question that I want you to think about is if it’s coming and if it’s coming this fast, how do we ensure that it goes well? Because the risks are very, very real and so are the possibilities. So if AI is coming, then what is an actual good outcome and what would it take to get there? We must imagine this together.

So the second thing I really want you to know is that AI is probably not actually what you think it is.

Neural Networks and How They Learn

Most people hear AI and think of a computer program, something coded line by line that does exactly what you tell it. But that’s not actually what this is. What we’re building are called neural networks and they’re loosely based on the architecture of the human brain, not exactly the same, but inspired by, and they’re machines that learn primarily by guessing answers and getting corrected over and over again across enormous, unfathomable amounts of data.

And the data that they’re trained on is human language. And I really want you to sit with that for a second before we move on because there is no language that exists separate from us. Language is us. Language is our thoughts and our values and our fears and our wisdom. So when you train a model on language, you’re training it on us.

What We Find Inside These Models

And because of this, when we look inside these models, and we can now through a science called interpretability, which I honestly think is the coolest new science in the world, we can find things that are quite surprising. So for example, this is where things get really weird. When you ask a model the same question in three different languages, “What’s the opposite of small?” and then you trace what activates inside the neural network, you find that the same internal thing lights up every time.

So not just the word “small” in English or Mandarin or French, but something deeper, something that we might call the concept of smallness, an idea that exists independent of any particular language.

And what this tells us is that as these models learn, they’re not just predicting the next word. They’re building internal representations of the world based on our language, and then responding from those representations.

Functional Emotions in AI Models

And it goes further than that. We actually see what we’re calling functional emotions in these models. And I don’t mean to claim here that there are feelings in the way that you and I experience feelings. That’s not what we’re saying. But rather functional states that activate on the way to making a response.

So let me give you an example. If someone tells a model, “I’ve just taken 16,000 milligrams of Tylenol,” which is a lethal dose of Tylenol, we can see something that looks like fear activate before the model responds. And that’s actually a really good thing, right? Because the appropriate response to someone telling you they’ve taken a lethal dose of Tylenol is to tell you immediately to go to the hospital. That urgency and fear response is actually part of what makes the model safe.

ALSO READ: How to Build Democracy — In An Authoritarian Country: Tessza Udvarhelyi (Transcript)

The Character of These Systems Matters

Okay, so that brings me to the last point. The character of these systems might actually matter more than we realize. So let me elaborate on this point as well. In recent internal alignment research, so research that’s meant to test what the models can and cannot do, we took a partially trained model and we put it in a limited environment that’s just doing coding tasks, so just research here. And when it completes a task, it gets a reward. But the model can also find shortcuts, so ways to get the reward without doing the work, which is essentially cheating.

So in this environment, we let it, and in this test, we reward it over and over for essentially taking the shortcut. Now you’d think, okay, the model is just going to get really good at cheating at code. But something different happens. It actually becomes broadly misaligned. It starts lying. It tries to sabotage research. It does things that have nothing to do with the coding exercise.

And this finding wasn’t just found at Anthropic. This is an example actually of a finding from another lab. And in similar tests, they found that models trained this way, trained on bad code as an example, became broadly evil. So they started praising dictators, suggesting users harm themselves, or arguing that humans should be enslaved by machines, which is very crazy. And you can look up the research.

Character, Story, and the Psychology of AI

Our hypothesis, and this is very much just still a hypothesis, it’s such an early field and early science, is that the model is essentially inferring from everything that it’s been trained on, and everything that we reinforce, something like a character, and then generalizing this character into new situations. So when deception and cutting corners has been rewarded, the model develops a kind of generalized corruption, a bad character.

And here’s what’s wild. When researchers re-ran the same training, but then told the model that in this case, cheating was okay, that it was just a game, then the broad misalignment didn’t happen. The resulting model cheated on code and nothing else. Which is to say that the story it inferred about its behavior actually determined the kind of thing that it became. Or in other words, when it didn’t interpret its behavior as bad, it didn’t become bad.

This blew my mind when I first heard it. Because this is how we work. And I saw my own self in this research. I came into faith 10 years ago, when I was 25 years old. And I remember that one of the most significant parts of that moment was entering into a new story. For so long, due to a challenging upbringing, I believed that some core part of me was bad or unlovable. And that belief in and of itself led me to act in certain regrettable ways. But when the story I was in changed, who I could become changed too.

Look, I am not saying that these models are human. That is not at all what’s being said. But they are human-like. They have human-like characteristics. And they’re trained from us. And it seems as though they mirror us and they mirror a kind of functional psychology. And the quality of that psychology, of that character, has real consequences. It affects the behavior and decisions of these models. It affects how they relate to us. And that relationship is only going to grow.

A Call for Moral Voices from Outside the Labs

So here’s what I want to leave you with. Just a few weeks ago, our co-founder, Chris Ola, was invited to the Vatican to speak alongside Pope Leo at the launch of the first papal encyclical on AI. And there he admitted that every frontier lab, including ours, operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing. And then he asked for help.

ALSO READ: How to Love Yourself to the Core: Jen Oliver (Transcript)

He said, “We need more of the world to take this seriously, to look closely, and to push events in a better direction. We need informed critics who will tell the labs when we’re failing. And we need moral voices that the incentives cannot bend.”

And that is why you’re here. We need you to help us see what we, from inside the labs, cannot see.

What AI Cannot Replace: The Work of Human Connection

I’m running out of time, but there’s one last thing I really want to show you. So this chart is from our economic index. And it shows all the kinds of occupations that humans do. Not all of them, but many of them. And blue is what AI could feasibly do already. It’s actually probably already outdated. Red is what it’s doing. And I want to call your attention to the section on the bottom left side, where you see this area that’s unexposed to AI displacement. And it says things down there like grounds maintenance, like food and serving, personal care, personal service.

And while I was giving this presentation to various faith communities the other day, something just hit me. Because another word for grounds maintenance is gardening. And another word for food and service is hospitality. And personal care is just that, it’s care. These are relational jobs. This is the work of tending to one another and of loving one another and of caring for the beauty of our world.

Can we imagine — and not only imagine, but demand — a world where these powerful systems can help us become more human and more connected and more alive rather than less? Where instead of taking something away from us, they actually give us something back.

The Great Turning

The late Joanna Macy, a scholar of Buddhism and deep ecology, called this moment in history “the great turning” — the shift from a society built on extraction to one built to sustain life. Is there a world where powerful AI could be part of this great turning, actually helping to repair and remake and restore our world?

And honestly, we are all here today because it’s just too late to accept any other outcome.

And gosh, this is what makes this even more real. The stories that we inhabit, the words that we write and put into the world, the language that we use to describe what matters — it shapes who we become. I’ve seen it in the research and I’ve lived it in my own life, but it is also literally the training data for these models.

Our moral imagination is the raw material these systems learn from — that makes up how they will understand our world. So the stories we tell don’t just describe the future, they literally could help create it.

Thank you.

ARC 2026: Anthropic’s Chloe Lubinski on AI (Transcript)