Take a moment to consider the economy in which we live in. The global economy, the US economy, the Colorado economy. The connections between all the different parts. It’s kind of mind boggling amount of complexity. The US produces about 20 trillion dollars of goods and services every year, and those are just the ones we count, we miss plenty along the way. I’ve been fortunate to work in my career on some pretty big, complicated models to try and make sense out of that: two, three, four thousand variables. But what becomes clearer the further you dive into that type of work is it gets more and more difficult every variable you add.
Here’s a pretty preeminent economic forecaster you may have heard of, Alan Greenspan, in charge of the US Federal Reserve for many years. He was being interviewed by one of my favorite newsmen, John Stewart, and he said basically, “Look, if I can do all of these fancy equations, I can get all these variables, but if you could just tell me how people feel, how they are reacting to the world around them, what’s their emotional mood that day, I could really start to make sense out of the economy.” And this was kind of the challenge that — I was on a team that just decided to take this on. Can we start to figure out how does the country feel today? How does Colorado feel today? And so, this is the work that we started to dive into, and I’m happy to share today with you a few of the things that we’ve started to discover.
So, the first thing that I often get asked is well, if you’re using things like Twitter data, social media data, to figure out how we all feel, and come up with this index of the mood of the nation. Or isn’t that just polling? Why don’t you just call people on the phone and ask them how they feel, and then put it all together? Well, there is a few problems with that. One, raise your hand if you still have a land line at home. That’s going to be highly correlated to age. I just want to tell you I got rid of my land line 15 years ago; probably never going back.
It’s a big problem for pollsters, right? They are trying to call people, trying to get in touch with the people, and get a scientific sample, and that’s great. We need scientific polling to answer lots of important questions, but it’s getting more and more difficult to do that work. The other thing that is really hard about polling is this idea of the Hawthorne effect, meaning if you know you are being observed, you change your answer. And this goes back to turn of the century time clock studies in factories in Massachusetts, for the historians in the room. If you’re being watched you’re going to work a little bit faster, right? If you’re getting a call from a pollster, and he says, “How do you feel today?” You might say, “Well, you know, I’m doing all right, I’m doing OK.” And so, the results can get a little bit skewed.
What we were trying to do was “OK, what if we could passively monitor people just by the words that they are using on social media? Figure out their mood by the way they were using language. And that’s exactly what we were trying to do with this project. Now, there are some problems on the social media side, too, right? I mean, it’s nice because it’s immediate: every millisecond, there are thousands of tweets being sent. In fact, about six hundred million tweets a day now around the globe. I’m sure there is a thousand being sent as I’m speaking, right now.
Right? Everybody is live tweeting? You’re being watched; that’s the take away from this talk. But if you add up all of these Twitter users around the globe, you can get this really instantaneous feedback on how people are feeling. But there is a problem on that side because what if you are oversampling, if you are counting too many people that do not represent the full population? Well, when Twitter started, that was a huge problem, right? It was the 25-year old white guys in San Francisco; they were the only ones tweeting. The good news is, since then, the Twitter use in the US, around the globe, and places like Western Europe, places like Saudi Arabia, the user base has increased so much that now, we have a sample that looks a lot like the rest of the population.
So, here in 2009, you can see a kind of my heat map of the predominance of male use on Twitter In the US right now, there is actually 51% female, 49% male. So this is almost perfect. It is good we’ve got over-representation from women. That probably makes it a better sample. The other problem that comes up is around ethnicity. What if there’s too many Caucasians or too many of whatever group? The other good news is that Twitter, in the US in particular, now looks basically the same as the percent in the general population. So, this data is getting better every single day. The other thing that’s happening is that we’ve got global use. These charts are a little bit tough to read, but the red line, which is US and Canadian data, as a percent of the total, is going down. It used to be 67 to 80%, now it’s less than a third of all tweets sent from Canada and the US. So it’s a democratizing user base around the globe.