Chris Hansen – TRANSCRIPT
Take a moment to consider the economy in which we live in. The global economy, the US economy, the Colorado economy. The connections between all the different parts. It’s kind of mind boggling amount of complexity. The US produces about 20 trillion dollars of goods and services every year, and those are just the ones we count, we miss plenty along the way. I’ve been fortunate to work in my career on some pretty big, complicated models to try and make sense out of that: two, three, four thousand variables. But what becomes clearer the further you dive into that type of work is it gets more and more difficult every variable you add.
Here’s a pretty preeminent economic forecaster you may have heard of, Alan Greenspan, in charge of the US Federal Reserve for many years. He was being interviewed by one of my favorite newsmen, John Stewart, and he said basically, “Look, if I can do all of these fancy equations, I can get all these variables, but if you could just tell me how people feel, how they are reacting to the world around them, what’s their emotional mood that day, I could really start to make sense out of the economy.” And this was kind of the challenge that — I was on a team that just decided to take this on. Can we start to figure out how does the country feel today? How does Colorado feel today? And so, this is the work that we started to dive into, and I’m happy to share today with you a few of the things that we’ve started to discover.
So, the first thing that I often get asked is well, if you’re using things like Twitter data, social media data, to figure out how we all feel, and come up with this index of the mood of the nation. Or isn’t that just polling? Why don’t you just call people on the phone and ask them how they feel, and then put it all together? Well, there is a few problems with that. One, raise your hand if you still have a land line at home. That’s going to be highly correlated to age. I just want to tell you I got rid of my land line 15 years ago; probably never going back.
It’s a big problem for pollsters, right? They are trying to call people, trying to get in touch with the people, and get a scientific sample, and that’s great. We need scientific polling to answer lots of important questions, but it’s getting more and more difficult to do that work. The other thing that is really hard about polling is this idea of the Hawthorne effect, meaning if you know you are being observed, you change your answer. And this goes back to turn of the century time clock studies in factories in Massachusetts, for the historians in the room. If you’re being watched you’re going to work a little bit faster, right? If you’re getting a call from a pollster, and he says, “How do you feel today?” You might say, “Well, you know, I’m doing all right, I’m doing OK.” And so, the results can get a little bit skewed.
What we were trying to do was “OK, what if we could passively monitor people just by the words that they are using on social media? Figure out their mood by the way they were using language. And that’s exactly what we were trying to do with this project. Now, there are some problems on the social media side, too, right? I mean, it’s nice because it’s immediate: every millisecond, there are thousands of tweets being sent. In fact, about six hundred million tweets a day now around the globe. I’m sure there is a thousand being sent as I’m speaking, right now.
Right? Everybody is live tweeting? You’re being watched; that’s the take away from this talk. But if you add up all of these Twitter users around the globe, you can get this really instantaneous feedback on how people are feeling. But there is a problem on that side because what if you are oversampling, if you are counting too many people that do not represent the full population? Well, when Twitter started, that was a huge problem, right? It was the 25-year old white guys in San Francisco; they were the only ones tweeting. The good news is, since then, the Twitter use in the US, around the globe, and places like Western Europe, places like Saudi Arabia, the user base has increased so much that now, we have a sample that looks a lot like the rest of the population.
So, here in 2009, you can see a kind of my heat map of the predominance of male use on Twitter In the US right now, there is actually 51% female, 49% male. So this is almost perfect. It is good we’ve got over-representation from women. That probably makes it a better sample. The other problem that comes up is around ethnicity. What if there’s too many Caucasians or too many of whatever group? The other good news is that Twitter, in the US in particular, now looks basically the same as the percent in the general population. So, this data is getting better every single day. The other thing that’s happening is that we’ve got global use. These charts are a little bit tough to read, but the red line, which is US and Canadian data, as a percent of the total, is going down. It used to be 67 to 80%, now it’s less than a third of all tweets sent from Canada and the US. So it’s a democratizing user base around the globe.
So, our signal that we can tap into to make these measurements is getting better and better every day. With a good data set and some great scientists I work with, we started to dive into this idea of understanding sentiment. Now, how do you get your hands around this? Let me give you a tangible example. Let’s say I give you the word “home” versus the word “house.” Which one is warmer? Which one is more positive? You tweet out, “I can’t wait to get home” versus “Hey, I’m headed back to my house.” Home is a warmer word, right? And let’s repeat that process for tens of thousands of words in the English language, and Spanish, and Italian, and German, etc. That’s how we can start to build up this map.
The other thing that we needed: it turns out, that approach only gives you about 50% of the way there. We also started using emoticons. It turns out, about 5% of tweets, give or take, depending on the country, use a frowny face, a smiley face, an emoji of some sort. And then we can start to map the words that are next to those emoticons.
And with those two techniques, plus some adjustment for things like sarcasm, and irony, and cuss words, by the way, cuss words are the toughest things to figure out. There is a lot of different ways to use cuss words in the English language. But if we start to adjust for some of these things, we can build up this really great map of how people are expressing themselves on social media. And this slide just shares a little bit about how we were testing that. We were comparing human raters to our machine raters, and the more we did that and fine-tuned the model, we basically got those two results to match up.