Mona Chalabi – TRANSCRIPT
I’m going to be talking about statistics today. If that makes you immediately feel a little bit wary, that’s OK, that doesn’t make you some kind of crazy conspiracy theorist, it makes you skeptical.
And when it comes to numbers, especially now, you should be skeptical. But you should also be able to tell which numbers are reliable and which ones aren’t. So today I want to try to give you some tools to be able to do that. But before I do, I just want to clarify which numbers I’m talking about here. I’m not talking about claims like, “9 out of 10 women recommend this anti-aging cream.”
I think a lot of us always roll our eyes at numbers like that. What’s different now is people are questioning statistics like, “The US unemployment rate is five percent.” What makes this claim different is it doesn’t come from a private company, it comes from the government. About 4 out of 10 Americans distrust the economic data that gets reported by government. Among supporters of President Trump it’s even higher; it’s about 7 out of 10.
I don’t need to tell anyone here that there are a lot of dividing lines in our society right now, and a lot of them start to make sense, once you understand people’s relationships with these government numbers. On the one hand, there are those who say these statistics are crucial, that we need them to make sense of society as a whole in order to move beyond emotional anecdotes and measure progress in an objective way.
And then there are the others, who say that these statistics are elitist, maybe even rigged; they don’t make sense and they don’t really reflect what’s happening in people’s everyday lives. It kind of feels like that second group is winning the argument right now. We’re living in a world of alternative facts, where people don’t find statistics this kind of common ground, this starting point for debate.
This is a problem. There are actually moves in the US right now to get rid of some government statistics altogether. Right now there’s a bill in congress about measuring racial inequality. The draft law says that government money should not be used to collect data on racial segregation. This is a total disaster.
If we don’t have this data, how can we observe discrimination, let alone fix it? In other words: How can a government create fair policies if they can’t measure current levels of unfairness? This isn’t just about discrimination, it’s everything – think about it. How can we legislate on health care if we don’t have good data on health or poverty? How can we have public debate about immigration if we can’t at least agree on how many people are entering and leaving the country? Statistics come from the state; that’s where they got their name. The point was to better measure the population in order to better serve it. So we need these government numbers, but we also have to move beyond either blindly accepting or blindly rejecting them. We need to learn the skills to be able to spot bad statistics.
I started to learn some of these when I was working in a statistical department that’s part of the United Nations. Our job was to find out how many Iraqis had been forced from their homes as a result of the war, and what they needed It was really important work, but it was also incredibly difficult. Every single day, we were making decisions that affected the accuracy of our numbers — decisions like which parts of the country we should go to, who we should speak to, which questions we should ask. And I started to feel really disillusioned with our work, because we thought we were doing a really good job, but the one group of people who could really tell us were the Iraqis, and they rarely got the chance to find our analysis, let alone question it.
So I started to feel really determined that the one way to make numbers more accurate is to have as many people as possible be able to question them. So I became a data journalist. My job is finding these data sets and sharing them with the public. Anyone can do this, you don’t have to be a geek or a nerd. You can ignore those words; they’re used by people trying to say they’re smart while pretending they’re humble.
Absolutely anyone can do this. I want to give you guys three questions that will help you be able to spot some bad statistics. So, question number one is: Can you see uncertainty? One of things that’s really changed people’s relationship with numbers, and even their trust in the media, has been the use of political polls. I personally have a lot of issues with political polls because I think the role of journalists is actually to report the facts and not attempt to predict them, especially when those predictions can actually damage democracy by signaling to people: don’t bother to vote for that guy, he doesn’t have a chance. Let’s set that aside for now and talk about the accuracy of this endeavor.
Based on national elections in the UK, Italy, Israel and of course, the most recent US presidential election, using polls to predict electoral outcomes is about as accurate as using the moon to predict hospital admissions. No, seriously, I used actual data from an academic study to draw this. There are a lot of reasons why polling has become so inaccurate. Our societies have become really diverse, which makes it difficult for pollsters to get a really nice representative sample of the population for their polls. People are really reluctant to answer their phones to pollsters, and also, shockingly enough, people might lie.
But you wouldn’t necessarily know that to look at the media. For one thing, the probability of a Hillary Clinton win was communicated with decimal places. We don’t use decimal places to describe the temperature. How on earth can predicting the behavior of 230 million voters in this country be that precise? And then there were those sleek charts. See, a lot of data visualizations will overstate certainty, and it works — these charts can numb our brains to criticism.
When you hear a statistic, you might feel skeptical. As soon as it’s buried in a chart, it feels like some kind of objective science, and it’s not. So I was trying to find ways to better communicate this to people, to show people the uncertainty in our numbers. What I did was I started taking real data sets, and turning them into hand-drawn visualizations, so that people can see how imprecise the data is; so people can see that a human did this, a human found the data and visualized it. For example, instead of finding out the probability of getting the flu in any given month, you can see the rough distribution of flu season.
This is — a bad shot to show in February. But it’s also more responsible data visualization, because if you were to show the exact probabilities, maybe that would encourage people to get their flu jabs at the wrong time. The point of these shaky lines is so that people remember these imprecisions, but also so they don’t necessarily walk away with a specific number, but they can remember important facts. Facts like injustice and inequality leave a huge mark on our lives. Facts like Black Americans and Native Americans have shorter life expectancies than those of other races, and that isn’t changing anytime soon.