Fei-Fei Li: Spatial Intelligence is the Next Frontier in AI (Transcript)

Read the full transcript of godmother of AI Dr. Fei-Fei Li in conversation with Diana Hu of Y Combinator on June 16, 2025 at AI Startup School in San Francisco.

Opening Remarks

DR. FEI-FEI LI: My entire career is going after problems that are just so hard, bordering delusional. To me AGI will not be complete without spatial intelligence, and I want to solve that problem. I just love being an entrepreneur. Forget about what you have done in the past, forget about what others think of you. Just hunker down and build. That is my comfort zone.

DIANA HU: So I’m super excited here to have Dr. Fei-Fei Li. She has such a long career in AI. I’m sure a lot of you know her, right? Raise your hand.

DR. FEI-FEI LI: I know you too.

DIANA HU: She’s been named the godmother of AI. One of the first projects that you created was ImageNet in 2009, 16 years ago.

DR. FEI-FEI LI: Oh my God, don’t remind me of that now.

DIANA HU: It has over 80,000 citations and it really kicked off one of the legs of stools for AI, which is the data problem. Tell us about how that project came about. It was pretty pioneering work back then.

The Birth of ImageNet

DR. FEI-FEI LI: Yeah. Well, first of all, Diana and Gary and everybody, thanks for inviting me here. I’m so excited to be here because I feel like I’m just one of you. I’m also an entrepreneur right now. Just started a small company. So very excited to be here.

ImageNet was… Yeah, you’re right. We actually conceived that almost 18 years ago. Time really flies. I was a first year assistant professor at Princeton. Oh, wow. Hi. Hi. Tigers. Yeah.

And the world of AI and machine learning was so different at that time. There was very little data. Algorithms, at least in computer vision, did not work. There was no industry. As far as the public was concerned, the word AI doesn’t exist.

But there is still a group of us, starting from the founding fathers of AI, John McCarthy, and then we go through people like Geoff Hinton. I think we just had an AI dream. We really, really want to make machines to think and to work with that dream.

My own personal dream was to make machines see, because seeing is such a cornerstone of intelligence. Visual intelligence is not just perceiving, it’s really understanding the world and do things in the world. So I was obsessed with the problem of making machine see.

And as I was obsessively developing machine learning algorithms at that time, we did try neural network, but it didn’t work. We pivoted to Bayesian networks to support vector machines, whatever it was. But one problem always haunted me, and it was the problem of generalization.

If you’re working in machine learning, you have to respect that generalization is the core mathematical foundation or goal of machine learning. In order to generalize these algorithms, these data. Yet no one had data at that time in computer vision.

And I was the first generation of grad student who was starting to dabble into data because I was the first generation of graduate student who saw the Internet, the big Internet of things.

So fast forward around 2007ish, my student and I decided that we have to take a bold bet. We have to bet that there needs to be a paradigm shift in machine learning. And that paradigm shift has to be led by data driven methods. And there was no data.

So we’re like, okay, let’s go to the Internet, download a billion images, that’s the highest number we could get on the Internet and then just create the world’s, the entire world’s visual taxonomy. And we use that to train and benchmark machine learning algorithm. And that was why ImageNet was conceived and came to life.

The AlexNet Breakthrough

DIANA HU: And it took a while until there were algorithms that were promising. It wasn’t until 2012 when AlexNet came out. And that was the second part of the equation with getting to AI was getting the compute and throwing enough at it and algorithms. Tell us about what was that moment where you started to see, oh, you seeded it with data. And now people started, the community started to figure more things out for AI, right?

DR. FEI-FEI LI: So between 2009 we published this tiny little CVPR poster. In 2009-2012, AlexNet, that there were three years that we really believe that data will drive AI, but we had very little signal in terms of if that was working.

So we did a couple of things. One is we open sourced, we believed from the get go we have to open source this to the entire research community for everybody to work on this. The other thing we did is we created a challenge because we want the whole world’s smartest students and researchers to work on this problem. So that was what we call the ImageNet Challenge.

So every year we release a testing data set. Well, the whole ImageNet is there for training, but we release testing and then we invite everybody openly to participate. And then the first couple of years was really setting the baseline. You know, the performance was in a 30% error rate. It wasn’t zero or I mean it wasn’t completely random, but it wasn’t that great.

But the third year, 2012, I wrote this in a book that I published, but I still remember it was around the end of summer that we were taking all the results of ImageNet challenge and running it on our servers. And I remember it was late night one day I got a ping from my graduate student. I was home and said, we got a result that really, really stand out and you should take a look.

And we looked into it. It was convolutional neural network. It wasn’t called AlexNet at that time. That team, that Geoff Hinton team, was called Super Vision.

It was a very clever play of the word super, as well as supervised learning. So supervision, and we look at what supervision did, it was an old algorithm.

Convolutional Neural network was published in the 1980s. There was a couple of tweaks in terms of the algorithm, but it was pretty surprising at the beginning for us to see that there was such a step change.

And of course we, we, I mean, the rest of the history, you all know we presented this in the ImageNet Challenge workshop in that year’s ICCV Florence, Italy. And Alex Krizhevsky came and many people came. I remember Yann LeCun also came. And now the world knows this moment as the ImageNet Challenge AlexNet moment.

I do want to say that it’s not just convolutional neural network. It was also the first time that two GPUs were put together by Alex and his team and were used for the computing of deep learning. So it was really the first moment of data, GPUs and neural network coming together.

From Objects to Scenes

DIANA HU: Now, following this trend of the arc of intelligence for computer vision, ImageNet was really the seed to solve the concept of object recognition. Then right after that, it started to also. AI got to the point that could solve scenes. Right, because you had a lot of the work with your students like Andrej Karpathy, being able to describe a scene, tell us about that transition from objects to a scene.

DR. FEI-FEI LI: Yeah, so ImageNet was solving the problem of you present, you’re presented with an image and then you call out objects. There’s a cat, there’s a chair and all that. That’s a fundamental problem in visual recognition.

But ever since I was a graduate student entering the field of AI, I had a dream, I thought it was 100 year dream, which is storytelling of the world, is that when humans open their eyes. Imagine you just open your eye in this room. You don’t just see person, person, person, chair, chair, chair. You actually see a conference room, you know, with screen, with stage, with people, with the crowd, the cameras, you actually can describe the entire scene.

And that’s a human ability that is at the foundation of visual intelligence. And it’s so critical for us to use in terms of our everyday life. So I really thought that problem will take my entire life. I literally, when I graduated as a graduate student. I told myself on my deathbed, if I can create an algorithm that can tell the story of a scene, I’ve succeeded. That was how I thought my career would be.

ImageNet AlexNet moment came deep learning took off. And then when Andrej and then later Justin Johnson enter my lab, we start to see signals of natural language and visions start to collide. And then Andrej and I proposed this problem of captioning images or storytelling.

And long story short, around 2015, Andrej and I published a series of papers that was among the first, with a couple of concurrent papers of making literally a computer that captioned an image. It was. I almost felt like, what am I going to do with my life? That was my lifelong goal, you know, it was such an incredible moment for both of us.

ALSO READ: Apple is Sex, Google is God, Facebook is Heart, & Amazon is Consumptive Gut, with Scott Galloway (Transcript)

And, you know, last year I gave a TED Talk, and I actually used something that Andrej tweeted a couple of years ago around the time he finished image captioning work. That was pretty much his dissertation. I actually joked with him. I said, hey, Andrej, why don’t we do the reverse? Take a sentence and generate an image. And of course, he knew I was joking, and he said, ha, ha, I’m out of here. The world was just not ready.

But now fast forward now we all know generative AI now we can take a sentence and generate beautiful pictures. So the moral of the story is AI has seen incredible growth. And personally, I feel I’m the luckiest person in the world because my entire career started at the very beginning of the end of AI Winter, the beginning of AI starting to take off. And so much part of my own work, my own career is part of this change or helped with this change. So I feel so fortunate and lucky and in a way, proud.

World Labs and Spatial Intelligence

DIANA HU: I think the wildest thing even to achieve your lifelong dream of describing a scene and even generating them with diffusion models, you actually dreaming bigger because the whole arc of computer vision went from objects to scenes, and now this concept of world. And you actually decided to move from academia, being a professor, to now being a founder and CEO of World Labs. Tell us about what world is. It’s even harder than scenes and objects.

DR. FEI-FEI LI: Yeah, it is. It is kind of wild. So, of course, you all know the past. It’s really hard to summarize the past five or six years. For me, we’re living in such a civilizational moment of this technology’s progress while computer vision, as a computer vision scientist, we’re seeing this incredible growth from ImageNet to image captioning to image generation using some of the diffusion techniques.

While this is happening in a very exciting way, we also have another extremely exciting thread, which is language, which is LLMs, which is that really 2022 November, ChatGPT blasted open the door of truly working generation models that can basically pass the Turing test and all that.

So this becomes very inspirational, even for someone as old as me, is to really think audaciously about what’s next. And I have a habit as a computer vision scientist. A lot of my inspiration actually come from evolution as well as brain science. I find myself in many moments of my career where I’m looking for the next North Star problem to solve.

I ask myself what evolution has done or what brain development has done, and there’s something that’s really important to notice or to appreciate. The development of human language in evolution took about, if you’re super generous, let’s just say took about 300 to 500 thousand years, less than a million years. That’s the length of evolution that took to develop a human language.

And pretty much humans are the only animals that has sophisticated language. We could argue about animal language, but really language in its totality, in terms of being a tool of communication, reasoning, abstraction, it’s really humans. So that took less than even half a million years.

But think about vision, Think about the capability of understanding 3D world, figuring out what to do in this 3D world, navigate the 3D world, interact with the 3D world, comprehend the 3D world, communicate the 3D world. That journey took evolution 540 million years.

The first trilobite developed a sense of vision underwater 540 million years ago. And since then, really vision was the reason that set off this evolutionary arms race. Before vision, animals were simple. For the half billion years. Before vision, there’s just simple animals.

But the next half billion years, 540 million years, because of the capability of seeing the world, understanding the world, evolutionary arms race began. And animal intelligence just start to raise each other.

So for me, solving the problem of spatial intelligence, to understand the 3D world, to generate the 3D world, to reason about the 3D world, to do things in the 3D world, is a fundamental problem of AI to me, AGI will not be complete without spatial intelligence. And I want to solve that problem.

And that involves creating world models, world models that goes beyond flat pixels, world models that goes beyond language, world models that truly capture the 3D structure and the spatial intelligence of the world.

And the luckiest thing in my life is no matter how old I am, I always get to work with the best young people. So I, you know, I founded a company with three incredible young but world class technologists, Justin Johnson, Ben Mildenhall and Christoph Lassner. And we are just going to try to solve, in my opinion, the hardest problem in AI right now, which is incredible talent.

Spatial Intelligence and the Future of AI

DIANA HU: I mean Chris, he was the creator of Pulsar, which was the initial seed before Gaussian splats that do a lot of differentiable rendering. There’s Justin Johnson, your former student, who really has this super system engineering mind that got real time neural style transfer. Then you got Ben, who was the author of Nerf paper. So this is a super cracked team.

And you need such a cracked team because we were chatting a bit about that. That vision is actually harder than LLMs to some extent. Maybe this is a controversial thing to say because LLMs are basically 1D.

DR. FEI-FEI LI: Right.

DIANA HU: But you’re talking about understanding a lot of the 3D structures. Why is this so hard and behind language research?

DR. FEI-FEI LI: Yeah, no, I really appreciate Diana, you emphasize how hard our problem is. Yeah, so language is fundamentally 1D, right. Syllables come in sequence. I mean this is why sequence to sequence, sequence modeling is so classic.

There’s something else that is language that people don’t appreciate. Language is purely generative. There’s no language in nature. You don’t touch language, you don’t see language. Language literally comes out of everybody’s head. That’s a purely generative signal. Of course, you put it on a piece of paper, it’s there. But the generation, the construction, the utility of language is very, very generative.

The world is far more complex than that. First of all, the real world is 3D and if you add time it’s 4D. But just, let’s just confine ourselves within space. It’s fundamentally 3D, so that by itself is a much more combinatorially harder problem.

Second, the sensing, the reception of the visual world is a projection. Whether it’s your eye, your retina or a camera. It’s always collapsing 3D to 2D. And you have to appreciate how hard it is. It’s mathematically ill posed. So you have to. This is why humans and animals have multi sensors. And then you have to solve that problem.

And third, the world is not purely generative. Yes, we could generate virtual 3D world. It still has to obey physics and all that, but there is also a real world out there. You are now suddenly dialing between generation and reconstruction in a very fluid way. And the user behavior, the utility, the use cases are very different. If you dial all the way to generation, we can talk about gaming and Metaverse and all that. If you dial all the way to real, you were talking about robotics and all that. But all this is on the continuum of world modeling and spatial intelligence.

So it’s a. And of course the elephant in the room is there’s a lot of data on the Internet for language and where is the data for spatial intelligence? You know, it’s all in our head, of course, but it’s not as easily as accessible as language. So these are the reason it’s so hard. But frankly it excites me because if it’s easy, somebody else has solved it. And my entire career is going after problems that are just so hard, bordering delusional. And I think this is the delusional problem. Thank you for supporting that.

Model Architectures and Human Vision

DIANA HU: And even thinking about this from first principles, the human brain has a lot more in the visual cortex and amount of neurons that process visual data as opposed to language. How does that translate into the model architectures are very different from LLMs from what you’re kind of finding out, right?

DR. FEI-FEI LI: Yeah, that’s actually a really good question. I mean, there’s still different schools of thoughts out there, right? There is the LLM. A lot of what we see in LLM is really writing scaling law all the way to happy ending. And you can almost, you can just brute force, self supervision all the way.

Constructive world model might be a little more nuanced. The world is more structured. There might be signals that we need to use to guide it. You can call it in the shape of prior, you can call it supervision in your data. Whatever it is. I think that these are some of the open questions that we have to solve.

ALSO READ: Jack Ma, Alibaba Group: Stanford GSB 2015 Entrepreneurial Company of the Year (Transcript)

But you’re right also if you think about human. First of all, we don’t have all the answers even to human perception. How does 3D working? Human vision is not a solved problem. We know mechanically the two eyes had to triangulate information, but even after that, where is the mathematical model? And we’re not that great. Humans are not that great as 3D animals. So there’s a lot that is to be answered.

So we are definitely at World Lab. I’m just counting on, really counting on one thing. I’m counting on. We have the smartest people in the pixel world to solve this.

Applications of 3D World Models

DIANA HU: Is it fair to say that what you’re building at World Labs is these whole new foundation models where the output are 3D worlds and what are some of the applications that you’re envisioning? Because I think you listed everything from perception to generation. This is always this tension between generative models and discriminate models. So what would these 3D worlds do?

DR. FEI-FEI LI: Yeah, so I’m not going to be able to talk too much about the details of world labs per se, but in terms of spatial intelligence, that’s what also excites me. Just like language, the use case is so huge. From creation, which you can think about designers, architects, industrial designers, as well as just artists, 3D artists, game developers. From creation all the way to robotics, robotic learning. The utility of spatial intelligence model or world models is really, really big.

And then there are many related industries, from marketing to entertainment to even Metaverse. I’m actually really, really excited by Metaverse. I know so many people are kind of still like, it’s still not working. I know it’s still not working. That’s why I’m excited, because I think the convergence of hardware and software will be coming. So that’s also another great use case down the road.

DIANA HU: I’m personally very excited that you’re solving Metaverse. I gave it a try in my previous company, so I’m so, so excited that you’re doing that now.

DR. FEI-FEI LI: Yeah, well, I think there’s more signal. I mean, I do think hardware is part of the hurdle, but you need content creation and in Metaverse, content creation needs world models.

From Academia to Entrepreneurship

DIANA HU: Let’s switch gears a little bit. So maybe to some of the audience, they might find your transition from going from academia to now being a founder CEO to be sudden, but you actually have the remarkable journey through your whole life. This is not your first time you’ve gone 0 to 1. You were telling me about how you immigrated to the US and you didn’t speak any English in your teens and you even ran a Laundromat for a good number of years. Tell us about how all those skills shaped who you are now.

DR. FEI-FEI LI: Right. I’m sure you guys are here trying to listen to how to start a Laundromat.

DIANA HU: That was when you were 19, right?

DR. FEI-FEI LI: Yeah, I was 19. And that was out of desperation. So I had no means of supporting my family. My parents and I need to go to college to be a physics major at Princeton. So I started a dry cleaning shop and in Silicon Valley language, I fundraised. I was the founder CEO. I was also the cashier and all the other things, and I exited. So after seven years. All right, you guys are very kind. I’ve never got claps for my Laundromat, but thank you.

But anyway, I think Diana’s point, especially to all of you. I look at you, I’m so excited for you because you’re like literally half my age or even maybe 30% of my age. And you’re so talented. Just do it, don’t be afraid.

You know, all my entire career, of course I did laundromat, but even as a professor, I chose a couple of times, I chose to go to departments where I was the first computer vision professor. And that was against a lot of advice. You know, as a young professor, you should go to a place where there’s a community and senior mentors. Of course I would love to have senior mentors, but if they’re not there, I still have to trailblaze my way. Right? So I wasn’t afraid of that.

And then I did go to Google to learn a lot about business in Google Cloud and B2B and all those. And then I started a startup within Stanford because around 2018, AI was not only taking over the industry, AI became a human problem. Humanity will always advance our technology, but we cannot lose our humanity. And I really care about creating a beacon of light in the progress of AI and try to imagine how AI can be human centered, how we can create AI to help humanity.

So I went back to Stanford and created Human Centered AI Institute and ran that as a startup for five years. Probably some people were not too happy. I ran it as a startup for five years in a university, but I was very proud of that. So in a way, I think I just love being an entrepreneur. I love the feeling of ground zero, like standing on ground zero. Forget about what you have done in the past, forget about what others think of you. Just hunker down and build. That is my comfort zone and I just love that.

Mentoring Legendary AI Researchers

DIANA HU: The other really cool thing about you, another, on top of all the awesome things you’ve done, you advise a lot of legendary researchers like Andrew Karpathy, Jim Fan who’s at Nvidia, Jia Deng, who’s your co author for ImageNet. They all went on to have these incredible, incredible careers. What really stood out about them when they were students? Like advice for the audience that you could tell, ah, this person’s going to change the field of AI and you could tell.

DR. FEI-FEI LI: So first of all, I’m the lucky one. I think I owe more to my students than the other way around. They really make me a better person, better teacher, better researcher. And having work with so many, like you said, legendary students is really the honor of my life.

So they’re very, very different. Some of them are just pure scientists trying to hunker down and solve a scientific problem. Some of them are industrial leaders. Some of them are, you know, the greatest disseminator of AI knowledge. But I think there is one thing that unifies them, and I would encourage every single one of you to think about this. I also, for those founders who are hiring, this is also my hiring criteria is I look for intellectual fearlessness.

I think it doesn’t matter what where you come from. It doesn’t matter what problem we’re trying to solve. That courage, that fearlessness of embracing something hard and go about it and be all in and trying to solve that in however way you want is really a core characteristic of people who succeed. I learned this from them. And I really look for young people who have that and then that as a CEO at World Labs, in my hiring, I look for that quality.

DIANA HU: You’re hiring a lot for World Labs too. So you’re looking for that same trait, right?

DR. FEI-FEI LI: Yes. I get permission from Diana to say that we’re hiring. So. Yes, so we are hiring a lot. We are hiring engineering talents, we’re hiring product talents, we’re hiring 3D talents, or hiring generative model talents. So if you feel you’re fearless, you’re passionate about solving spatial intelligence, talk to me or come to our website.

Q&A Session

DIANA HU: Cool. We’re going to open it up for questions for the next 10 minutes.

DR. FEI-FEI LI: Hi, Fei Fei, thank you for your talk.

DIANA HU: I’m a big, big, big fan.

DR. FEI-FEI LI: And yeah, so my question is, more than two decades ago, you worked on visual recognition. I want to start my PhD. What should I work on so I become a legend like you are?

I want to give you a thoughtful answer because I can always say do whatever excites you. So first of all, I think AI research has changed because academia, if you’re starting a PhD, you are in academia. Academia no longer has most of the AI resources. It’s very different from my time. Right. The compute and the data are kind of really low in terms of resourcing academia. And then there are problems that industry can run a lot faster.

So as a PhD student, I would recommend you to look for those north stars that are not on the collision course of problems that industry can solve better using better compute, better data, and team science. But there are some really fundamental problems that we can still identify in academia that it doesn’t matter how many chips you have, you can make a lot of progress.

First of all, interdisciplinary AI to me is a really, really exciting area in academia, especially for scientific discovery. There’s just so many disciplines that can cross AI. I think that’s a big area that one could go to.

On the theoretical side, I find it fascinating that the AI capability has 100% outrun theory. We don’t know how, you know, we don’t have explainability, we don’t know how to figure out the causality. There’s just so much in, in the models we don’t understand that one could push forward and the list could go on.

ALSO READ: Federico Pistono: Robots Will Steal Your Job, but That's OK (Full Transcript)

In computer vision, there’s still representational problems we haven’t solved. And also small data, that’s another really interesting domain. These are the possibilities. Thank you so much.

DIANA HU: Fei Fei. Thank you, Professor Li, and congratulations again.

On AGI: Single Model vs Multi-Agent Systems

DR. FEI-FEI LI: On your honorary doctorate from Yale. I was honored there to witness that moment one month ago. And my question is, in your perspective, will AGI emerge more likely as a single unified model or as a multi agent system?

The way you ask this question is already two kind of definitions. One definition is more theoretical, which is define AGI as if there is a IQ test that one passes that defines AGI. The other half of your question is much more utilitarian. Is it functional? If it’s agent based, what tasks can it do?

I struggle with this definition of AGI, to be honest. Here’s why. The founding fathers of AI who came together in 1950 in Dartmouth, you know, the John McCarthy and Marvin Minsky of them, they wanted to solve the problem of machines that can think. And that’s a problem that Turing, Alan Turing also put forward a few years earlier, 10 years or whatever, earlier than them.

And that statement is not a narrow, it’s not a narrow AI, it’s a statement of intelligence. So I don’t really know how to differentiate that funding question of AI versus this new word AGI, to me they’re the same thing. But I get it that the industry today like to call AGI as if that’s beyond AI. And I struggle with that because I feel there, I don’t know what exactly is AGI differ from AI.

If we say today’s AGI-ish system performs better than the narrower AI system in 80s, 70s, 90s or whatever, I think that’s right. That’s just the progression of the field. But fundamentally I think the size of AI is. The size of intelligence is to create machines that can think and do things as intelligently or even more intelligently as humans.

So I don’t know how to define AGI. So I don’t know without defining it. I don’t know if it’s monolithic. If you look at the brain. It’s one thing, you know, you can call it monolithic, but it does have different functionalities. And you can even. There’s Broca’s area for language, there’s visual cortex, there’s motor cortex. So I don’t really know how to answer that question.

Advice on Pursuing Graduate School in AI

DIANA HU: Hi, my name is Yashna and I just want to say thank you. I think it’s really inspiring to see a woman playing a leading role in this field and as a researcher, educator and entrepreneur. I wanted to ask, what type of person do you think should pursue graduate school in this rapid rise of AI?

DR. FEI-FEI LI: That’s a great question, and that’s a question even parents ask me. I really think graduate school is the four, five years where you have burning curiosity. You’re led by curiosity. And that curiosity is so strong that there’s no better other place to do it.

It’s different from a startup because startup is not just like, you have to be a little careful. Startup cannot be just led by curiosity. Your investors will be mad at you. It’s startup has a more focused commercial goal and some part of it is curiosity. But it’s not just curiosity.

Whereas for grad school, that curiosity to solve problem or to ask the right questions is so important that I think those going in with that intense curiosity would really enjoy the four or five years. Even if the outside world is passing by at the speed of light, you’ll still be happy because you’re there following that curiosity.

Open Source Strategies in AI

DIANA HU: First, I wanted to say thank you for your time. Thank you for coming out to speak to us. You mentioned that open sourcing was a big part of the growth from ImageNet. And now with the recent release and growth of large language models, we’ve seen organizations taking different approaches with open source, with some organizations staying fully closed source, some organizations fully releasing their entire research stack, some being somewhere in the middle, open sourcing weights or having restrictive licenses and things of that nature. So I wanted to ask, what do you think of these different approaches to open source and what do you believe the right way to go about open source as an AI company is?

DR. FEI-FEI LI: I think the ecosystem is healthy when there are different approaches. I’m not religious. In terms of you must open source or you must close source. It depends on the company’s business strategy.

And for example, it’s clear why Meta wants to open source. Right now. Their business model is not selling the selling the model yet they’re using it to grow the ecosystem so that people come to their platform. So open source makes a lot of sense, whereas another company that is really monetizing. On the even monetizing, you can think about an open source tier and a closed source tier. So I’m pretty open to that.

Or a meta level is. I think open source should be protected. I think if there is efforts of open source, both in public sector like academia as well as private sector is so important. It’s so important for the entrepreneurial ecosystem, it’s so important for public sector that I think that should be protected, it shouldn’t be penalized.

Data Challenges for World Models

DIANA HU: Hi, my name is Carl, I flew in from Estonia. I have a question about data. So you called very well. The shift in machine learning towards data driven methods with ImageNet. Now that you’re working on world models and you mentioned that we don’t have this spatial data on the Internet. It exists only in our heads. How are you solving this problem? What are you betting on? Are you collecting this data from the real world? Are you doing synthetic data? Do you believe in that or do you believe in good old priors? Thanks.

DR. FEI-FEI LI: You should join World Labs and I’ll tell you.

DIANA HU: Oh, good one.

DR. FEI-FEI LI: Look, as a company I’m not going to be able to share a lot, but I think it’s important to acknowledge that we were taking a hybrid approach. It is really important to have a lot of data, but also have a lot of quality data. Data at the end of the day, there is still garbage in, garbage out. If you’re not careful with the quality of data.

Overcoming Challenges as a Minority in STEM

DIANA HU: We’ll do one last question.

DR. FEI-FEI LI: Hi Dr. Lee, my name is Annie and thank you very much for speaking with us. So in your book the World I See, you talk, the challenges you face as a immigrant girl and women in stem. I’m curious to know if there’s a time that you feel the moment of being a minority in the workplace and if so, how did you manage to overcome this or persuade others?

DR. FEI-FEI LI: Thank you for that question. I want to be very, very careful or thoughtful in answering you because we all come from different background and how each of us feel is very unique. It almost doesn’t even matter what are the big categories. All of us have moments that we feel we’re the minority or the only person in the room.

So of course, I felt that way. Sometimes it’s based on who I am, sometimes it’s based on my idea, sometimes it’s just based on, I don’t know, the color of my shirt, whatever that is I have. But this is where I do want to encourage everybody. Maybe it is because since I was young coming to this country, I kind of have experience. It is what it is. I am an immigrant woman. I almost developed a capability to not over index on that. I’m here, just like every one of you. I’m here to learn or to do things or to create things. Thank you.

DIANA HU: That was a great answer.

DR. FEI-FEI LI: And I really, all of you, you’re about to embark on something or in the middle of embarking something, and you’re going to have moments of weakness or strangeness or. I feel this every day, especially startup life. Sometimes I’m like, oh, my God, I don’t know what I’m doing. Just focus on doing it. Gradient descend yourself to the optimized solution.

DIANA HU: All right. That’s a great way to ending. Thank you, Dr. Li.