Skip to content
Home » Big Data Meets Cancer: Neil Hunt at TEDxBeaconStreet (Full Transcript)

Big Data Meets Cancer: Neil Hunt at TEDxBeaconStreet (Full Transcript)


Well, I hope none of you came here to hear about Netflix because I’m not going to say anything about Netflix at all. I have spent the last decade, though, figuring out how to use crowdsource data to make personalized recommendations and what I’ve become attracted to is the idea of using crowdsource data to solve perhaps a more socially important problem: “How do we find cures for cancer?” And you might think that’s an outrageous thing for somebody who has got no medical background at all, just technology and an entertainment background to propose.

So, let me start by posing a couple of questions: why is cancer different? Well, cancer isn’t one disease, cancer is thousands of diseases as Stéphane pointed out in the first session this morning for those of you who were here. And in fact, because it’s thousands of different diseases, there’s no single cure that can solve cancer. We need specific cures for each disease. So what tools can we apply to finding those cures? Because classical, clinical trials is not going to solve the problem for us. Let me give a little bit of background here.

This is a long tail distribution, and on the left, you’ve got a few things that happen frequently, and on the right, you’ve got a lot of things that happen infrequently. And so how is this relevant? Well, the stuff on the right actually constitutes most of the area on this curve. So if you can’t solve the problems that happen, the many problems that happen infrequently, you can’t solve the problem. If you can’t solve for the 10,000 cancers, you can’t solve cancer.

Now, 20th century medicine has done a miraculous job of solving problems for maladies and ahead of that curve. And so we have antibiotics, and we have vaccines, and we have Xanax and we have Tylenol, and hundreds of different drugs that tackle diseases that have a single cause where a single molecular mechanism can solve those problems, and that has done a remarkable job and elevated us, but perhaps many of you have the sense, as I do, that we’ve hit a bit of a stumbling block in the 21st century, in terms of making progress with medicine. And the diseases left over don’t respond to a one-size-fits-all solution. They need specific solutions for the particular problem that we’re dealing with.

Now, cancer is a scary disease. Cancer is just a software error. Cancer happens when cells replicate the DNA incorrectly. And incorrectly replicated DNA usually just kills the cell, and that’s not a problem because we’ve got plenty more. But sometimes that software error causes the cell to divide and multiply rapidly, and it threatens the life of the whole organism, and that’s when the cancer patient dies, and that’s not a good thing. And so, like any software problem, there are many ways that you can introduce defects into that DNA chain that cause the symptoms of cancer. And each of those defects is a separate disease that needs to be solved separately.

The treatments developed for cancer in the 20th century focused on the one thing that all cancers had in common: rapidly dividing cells. But chemotherapy focused at rapidly dividing cells affects all cells, so it’s kind of like carpet bombing a city in the aim of hitting the terrorist cell that’s hiding within the city. And that’s sometimes effective, but it always causes massive collateral damage, and that’s a problem, that’s not a good way to seek to solve the problem.

Cancer then, is a long tail disease. On the left, you’ve got some common cancers that occur frequently, and on the right, you’ve got many other cancers with obscure names that occur much less frequently. And the approaches we have today for solving cancer aim to solve – You pick a frequently occurring cancer, a lung cancer, and you study the molecular pathways that that cancer uses. Then you design a drug that intercepts those proteins involved in that molecular pathway. Then you enroll a clinical trial of 10,000 users, and then a decade later, and millions of dollars later in expense, perhaps you have a solution for one cancer and perhaps you have a failure.

So we have a hard time tackling the diseases in the long tail because there are so many of them, it’s impossible to master 10,000 patients for a meaningful clinical trial. But, actually, the problem is much worse than that. Even lung cancer is not one disease. Lung cancer is hundreds of different diseases caused by different genetic mutations down the chain here. And so each of those requires a different solution, and the drugs we design, tackle one of two of these things at most. The drugs we’re actually making pretty good progress with. This is the molecular model showing how the genetic mutations affect the various different proteins that drive the mechanisms that cause the cell proliferation, and different pathways are active in different ways for different genetic modifications that cause this particular cancer.

And the drug designers are able to build drugs that intercept particular pathways on this diagram, much more specific, they target the proteins fundamental to that cancer itself, much more specific than chemotherapy, much more effective, much less side effect. But we don’t know how to target these things effectively to deal with the thousand or a ten thousand different cancers that we know about today.

So I’d like to start with two stories that sort of illustrate the problem. Marty Tenenbaum was the Head of the AI lab where I worked as an intern in the 1980s, and he became a friend of mine. And in 1988, he was diagnosed with metastatic melanoma. Not the kind caused by exposure to the sunlight. This is the kind that kills you. And he was dying. And he went to see a number of different doctors, and there were a handful of clinical trials that showed promise for the kind of cancer he had. But each doctor had a different recommendation, and none had data to back up why their particular drug was the right solution for his cancer.

And so he collected what data he could and bet his life on the trial of Canvax. But it was a trial that ultimately failed, and the company went broke, and the drug is no longer on the market, but he had a very specific genetic mutation that that drug was able to solve. So that’s an example of learning from a failed trial, from a drug trial that did not lead to a solution to a problem in the classic sense, but did solve this one particular cancer.

Here’s a different guy, Lukas Wartman. Lucas was an oncologist, is an oncologist, I’m happy to say, at the University of Washington. And he spent his life studying Acute Lymphoblastic Leukemia. He was diagnosed in 2011 with the very disease that he’d spent his life studying. And all the treatments he’d helped develop did not solve his problem. So his colleagues took a novel approach. They sequenced the DNA of his cancer, and they sequenced his DNA, and compared the two, to find the specific mutations that were driving his cancer.

And they found an unusual discovery. They found that a gene called FLT3 was over expressed in his cancer. That’s very unusual in ALL, it’s less than 1% of all cases. But it turns out to be quite common in kidney cancer, and happily, there’s a targeted drug aimed at some kinds of kidney cancer called Sutent. And Sutent saved Lucas’s life, and he’s back to being an oncologist at the University of Washington.

And so that’s an example of a drug trial which had success in a limited sense for kidney cancer, but did not reveal the opportunities for other cancers. Now there are opportunities there. Clinical trials, the classic way that we seek to discover whether a drug is effective or not, are just not effective at finding solutions for the long tail of cancers. That’s subject to the tyranny of the average. What happens here is that if the drug is generally ineffective for most patients, it’s considered a failure, even if there is a couple of survivors who have a specific genetic mutation in that cancer that’s helped by this drug. This tailor is not captured in the clinical trials, is lost, and the drug has failed, and is not available for use, even though it has promise for other areas.

The problem gets even knottier. Most cancers exploit several different molecular pathways, and so, these wonder drugs that target specific pathways are generally accepted now to be the building blocks of a cocktail treatment where you need to take two or three different drugs to block all the pathways that your cancer is using. And so, when you think about it, there are about 100 different targeted therapies that are approved and available. The most well known is the Gleevec that tackles chronic myelocytic leukemia approved in 2001, there is a whole string of about a hundred of these things. That makes 10,000 pair-wise combinations, a million triples.

So if we’re talking about exploring all combinations of pairs and triples of drugs for use against thousands or tens of thousands of cancers, the number of combinations is just immense. And we’ll never solve that by enrolling clinical trials of 10,000 customers or patients at a time. We need a different method. The two cases I talked about, Marty and Lukas, are specific examples of trials of n equals 1; one participant, and generally we think of one outcome as an anecdote, and scientists consider that you can’t learn from an anecdote. But in fact, these are not isolated cases.

These kinds of experiments are happening hundreds of times a day, as the doctors try to figure out the right solution for their particular patient, what drug might work or might help, and how to enroll them in a trial, or in an experiment, that might yield a good result. But those doctors are making those predictions without access to data that would help them make valid conclusions, and the data that they get from those experiments is not easily captured and made available for future patients. And so we have a catastrophe here of lost learning.

But what if we could pool the data across cancer patients worldwide? What if we could assemble a giant database with everybody’s genetic signatures, the biomarkers, the treatments they have attempted? Which ones were successful, which ones were failures? We could learn a tremendous amount.

We could use the big data techniques, the machine learning and artificial intelligence, to pull from that database the learnings that are relevant. For particular patients, we can find other patients who share the specific genetic defects that their cancer is exhibiting, and we could find the patients in that set, who had successful treatments and unsuccessful treatments, and we could prescribe a particular combination of drugs for that particular patient, in a way that you could never do for the classical clinical trial. This is much cheaper with human life than organizing classical clinical trials.

In a 10,000 patient trial, 5,000 don’t even get the drug, and 10,000 patients in a trial that fails didn’t actually add any particular information to that knowledge, and they perhaps should have had the opportunity to try something different. Here, we can use probability models to continuously update the probability of a particular drug working in various different dimensions for people with different genetic mutations driving their cancers.

As we learn that one trial failed, we can degrade the probabilities that drug in that case and maybe upgrade in other cases as well. So this is an interesting idea that I think is powerful. We call it open science, the practice of science where all of the data and all of the conclusions on the experiments is published for everyone to share and to learn more rapidly. And rapid learning communities: groups of patients, doctors, institutions organized around a database sharing all this data. And this is a future way to find solutions to cancers that go down the long tail.

There’re a number of different challenges: the one that leaps to mind immediately is we start sharing this data in a giant database, patient privacy is a big issue. Well, fear of death is an even bigger issue. I hold it to you that most patients are willing to contribute on this situation.

Competition: the reluctance of many institutions to publish and share their data is rational and valid. For example, publishing the data to a shared database might preclude being first to publish a scientific paper, and the flow of scientific papers is critical to the ongoing funding of those institutions. So, this is a fundamental problem that we have to overcome.

Fortunately, the smaller institutions have a big incentive to pull their data in order to be able to better compete with the bigger institutions who have more data in the first place. So this is a solvable problem. There is a challenging ethical concern we have to struggle with; these kind of experiments of n=1 for off-label use of drugs, or drugs that were found to be safe but not approved for use, are considered beyond the standard of care, and they’re not typically covered by your normal insurance company. So these experiments and treatments are then available to the wealthy and well-connected patients who have the wealth to enroll in these kinds of trials.

How should we feel about a treatment regime like this that’s available only to the select few and not to everybody? Is that OK? Or is that a big problem? Earlier doctors in any field always pay outsized cost, take outsized risk, and get outsized benefit with the view to advancing the technology for everybody else who follows. And the same argument pertains here. The key, I think, is to make sure that we have an alignment of incentives. If we can set it up, so that the patients who bet with their lives, and the doctors who bet with their reputations, and the drug companies and the insurance companies who bet with their money, if we can set it up so that there’s funding driven towards research and investigation of new drugs and new therapies, then we’ll be successful in this endeavor, too. And this will lead to, bigger and better treatments for everyone in the long run.

The final problem I want to talk about is access to drugs. Access to these targeted therapies, is typically tied-up, and red tape, and a challenge here. Clinical trials are the classic way to get hold of these things, are quite limited, not available to patients with confounding conditions, for example, which is, most people by the time they get to cancer treatment. And then limiting, they don’t typically allow treatment in conjunction with other treatments, so, the cocktail approach is challenging. And access to the drugs outside of trials is tough as well.

But these are social and political problems, and if we have data we can solve these problems, too. So the starting point here, is we need to build a grass-roots movement to start building these knowledge-bases about the cancers, and the treatments, and the outcomes that were successful and unsuccessful. Because that’s how we make progress. Cancer Commons is a non-profit that’s dedicated to setting up a knowledge-base, a rapid-learning community around cancer. And I would ask you, if any of you should be unlucky enough to be diagnosed with cancer, consider sharing your data with Cancer Commons.

And potentially, a scientist or a researcher, poking around in the database may find some novel therapy or combination that would benefit you, and that’s a big advantage. And certainly, your data helps patients who follow, by expanding the state of learning. So I’d like a show of hands. Who’s a cancer survivor or suffered from a cancer to date? And who has a friend or loved one? And if there’s anyone who hasn’t their hand up, who’s scared, that they will, next time?

So, my big idea, we’re not going to solve cancer with classical clinical trials, one step at a time, because it’s a long-tail disease with tens of thousands of variations. But if we can apply the same techniques and machine learning, the big data techniques that Google uses to refine that search result, and Netflix uses to make movie recommendations, and the NSA uses to find terrorist phone calls, we have a new tool in our arsenal, that we can use to tailor those drug therapies to what’s the long-tail of cancer.

Thank you.

Related Posts

Reader Disclosure: Some links on this Site are affiliate links. Which means that, if you choose to make a purchase, we may earn a small commission at no extra cost to you. We greatly appreciate your support.