Sebastian Wernicke – Data scientist
Roy Price is a man that most of you have probably never heard about, even though he may have been responsible for 22 somewhat mediocre minutes of your life on April 19, 2013. He may have also been responsible for 22 very entertaining minutes, but not very many of you. And all of that goes back to a decision that Roy had to make about three years ago.
So you see, Roy Price is a senior executive with Amazon Studios. That’s the TV production company of Amazon. He’s 47 years old, slim, spiky hair, describes himself on Twitter as “movies, TV, technology, tacos.” And Roy Price has a very responsible job, because it’s his responsibility to pick the shows, the original content that Amazon is going to make. And of course that’s a highly competitive space. I mean, there are so many TV shows already out there, that Roy can’t just choose any show. He has to find shows that are really, really great. So in other words, he has to find shows that are on the very right end of this curve here.
So this curve here is the rating distribution of about 2,500 TV shows on the website IMDB, and the rating goes from one to 10, and the height here shows you how many shows get that rating. So if your show gets a rating of nine points or higher, that’s a winner. Then you have a top two percent show. That’s shows like “Breaking Bad,” “Game of Thrones,” “The Wire,” so all of these shows that are addictive, whereafter you’ve watched a season, your brain is basically like, “Where can I get more of these episodes?” That kind of show.
On the left side, just for clarity, here on that end, you have a show called “Toddlers and Tiaras” which should tell you enough about what’s going on on that end of the curve.
Now, Roy Price is not worried about getting on the left end of the curve, because I think you would have to have some serious brainpower to undercut “Toddlers and Tiaras.” So what he’s worried about is this middle bulge here, the bulge of average TV, you know, those shows that aren’t really good or really bad, they don’t really get you excited. So he needs to make sure that he’s really on the right end of this.
So the pressure is on, and of course it’s also the first time that Amazon is even doing something like this, so Roy Price does not want to take any chances. He wants to engineer success. He needs a guaranteed success, and so what he does is, he holds a competition.
So he takes a bunch of ideas for TV shows, and from those ideas, through an evaluation, they select eight candidates for TV shows, and then he just makes the first episode of each one of these shows and puts them online for free for everyone to watch. And so when Amazon is giving out free stuff, you’re going to take it, right? So millions of viewers are watching those episodes.
What they don’t realize is that, while they’re watching their shows, actually, they are being watched. They are being watched by Roy Price and his team, who record everything. They record when somebody presses play, when somebody presses pause, what parts they skip, what parts they watch again. So they collect millions of data points, because they want to have those data points to then decide which show they should make. And sure enough, so they collect all the data, they do all the data crunching, and an answer emerges, and the answer is, “Amazon should do a sitcom about four Republican US Senators.” They did that show.
So does anyone know the name of the show? (Audience: “Alpha House.”) Yes, “Alpha House,” but it seems like not too many of you here remember that show, actually, because it didn’t turn out that great. It’s actually just an average show, actually — literally, in fact, because the average of this curve here is at 7.4, and “Alpha House” lands at 7.5, so a slightly above average show, but certainly not what Roy Price and his team were aiming for.
Meanwhile, however, at about the same time, at another company, another executive did manage to land a top show using data analysis, and his name is Ted, Ted Sarandos, who is the Chief Content Officer of Netflix, and just like Roy, he’s on a constant mission to find that great TV show, and he uses data as well to do that, except he does it a little bit differently.
So instead of holding a competition, what he did — and his team of course — was they looked at all the data they already had about Netflix viewers, you know, the ratings they give their shows, the viewing histories, what shows people like, and so on. And then they use that data to discover all of these little bits and pieces about the audience: what kinds of shows they like, what kind of producers, what kind of actors. And once they had all of these pieces together, they took a leap of faith, and they decided to license not a sitcom about four Senators but a drama series about a single Senator. You guys know the show? Yes, “House of Cards,” and Netflix of course, nailed it with that show, at least for the first two seasons. “House of Cards” gets a 9.1 rating on this curve, so it’s exactly where they wanted it to be.
Now, the question of course is, what happened here? So you have two very competitive, data-savvy companies. They connect all of these millions of data points, and then it works beautifully for one of them, and it doesn’t work for the other one. So why?
Because logic kind of tells you that this should be working all the time. I mean, if you’re collecting millions of data points on a decision you’re going to make, then you should be able to make a pretty good decision. You have 200 years of statistics to rely on. You’re amplifying it with very powerful computers. The least you could expect is good TV, right?
And if data analysis does not work that way, then it actually gets a little scary, because we live in a time where we’re turning to data more and more to make very serious decisions that go far beyond TV. Does anyone here know the company Multi-Health Systems? No one. OK, that’s good actually.
OK, so Multi-Health Systems is a software company, and I hope that nobody here in this room ever comes into contact with that software, because if you do, it means you’re in prison.
If someone here in the US is in prison, and they apply for parole, then it’s very likely that data analysis software from that company will be used in determining whether to grant that parole. So it’s the same principle as Amazon and Netflix, but now instead of deciding whether a TV show is going to be good or bad, you’re deciding whether a person is going to be good or bad. And mediocre TV, 22 minutes, that can be pretty bad, but more years in prison, I guess, even worse.
And unfortunately, there is actually some evidence that this data analysis, despite having lots of data, does not always produce optimum results. And that’s not because a company like Multi-Health Systems doesn’t know what to do with data. Even the most data-savvy companies get it wrong. Yes, even Google gets it wrong sometimes.
In 2009, Google announced that they were able, with data analysis, to predict outbreaks of influenza, the nasty kind of flu, by doing data analysis on their Google searches. And it worked beautifully, and it made a big splash in the news, including the pinnacle of scientific success: a publication in the journal “Nature.” It worked beautifully for year after year after year, until one year it failed. And nobody could even tell exactly why. It just didn’t work that year, and of course that again made big news, including now a retraction of a publication from the journal “Nature.”