Making Data Mean More Through Storytelling: Ben Wellington at TEDxBroadway (Transcript)

Ben Wellington

Here is the full transcript of data scientist Ben Wellington’s TEDx Talk: Making Data Mean More Through Storytelling at TEDxBroadway conference. This event occurred on February 23, 2015 in New York.

Ben Wellington – TRANSCRIPT

Hi everyone, as it says, I am Ben Wellington, and I am a data story teller. A data story teller.

If you had asked me a year ago what a data story teller was, I would probably say I have no idea. So today, I am going to tell you about my journey over the last year, where I have accidentally became a data story teller. I’ll tell you about what I learned along the way, and maybe convince some of you that you too can be data story tellers if you are curious and want to.

A little about my background. First, I work at an investment and tech company called Two Sigma where I do data science. So that is sort of one part of my world. But I also married an urban planner, so I’ve got my computer science and the urban planning world, and for most of the time, these two sections of my life have been pretty separate, and that was just the way it was.

Until something interesting happened here in NYC, in 2011, then Mayor Bloomberg signed this legislation called the Open Data Laws in New York. The Open Data Laws are really exciting for people like me because it takes data that is inside City Government, and suddenly allows anyone to look at it. Whereas before the government would analyze something and tell us, “Hey, this neighborhood has this many accidents,” now we can see data point by data point what is happening at a very local level.

When these two things came together, they ended up – by the way, there was an open data portal I should point out, that anyone can go to, it is the NYC open data portal, and there are data sets on all sorts of things. In fact, there is one on the size of the televisions in Times Square and their locations. I don’t know what to do with that, but it’s really cool. There are data sets of all different types, in fact over 1,200 data sets so far. And it is growing all the time.

I kind of took this data science work, and my interest in urban planning from discussions with my wife and put it together in this blog, called I QUANT NY. Awesome, thank you. One of the first things I did was this map. And this is a map of cycling injuries in New York City. Hilarious injuries – Red areas are areas where people were getting in more cyclists accidents.

Then I found this through some public data and mapped it all. I notice a few things: one, that on the East Side of Manhattan is there were more cycling accidents, more injuries, because that’s where there are more cyclists coming off the bridges. But also there were some other hot spots, like Williamsburg in Brooklyn, or Roosevelt Avenue in Queens. I wrote about that and posted it on the blog – and it was more just for me to learn how to do mapping; it was this open source software called QGIS, I wanted to learn it – and when I did, something interesting happened, people started to write about it. Gothamist covered it, and Brokelyn claimed that it was a “death trap”, which is not exactly what I said. Streets Blog, and then, even in The Atlantic. This is just sort of from a blog that I put on Tumblr, and it had no followers, and that was really interesting.

Over the time from there I started to say, “Why did people write about that? I’m not the first one to analyze cyclists accidents or to make maps like this.” What I did wasn’t that complicated, what was it that made it spread? And I thought about it, and I worked over the next few posts, I got to see what was moving around, and I realized that there was a third part to this that was really, really important. That is, you probably are not going to see it coming: improv comedy.

Yes, improv comedy. I have been doing improv since I was in a summer camp called French Woods – Yeah! French Woods alumni! – in upstate New York. I did improv since I was about 13, I’ve been doing it ever since I have learned a lot of things in improv. I realized I was bringing into my writing, into this data science to make people more interested in it. I think in order to spread science you need to be able to tell the stories, so I am going to tell you about why improv relates to data science, and how they can come together to tell better stories.

ALSO READ:   In the Opioid Crisis, Here's What It Takes to Save a Life: Jan Rader (Transcript)

That is why I call it data story telling. First, in stories you want to connect with people’s experiences, right? If you are doing an improv scene, you learn that if you are brushing your teeth next to your wife, something that people can relate to, how they look when they’re brushing their teeth, you can do a scene through that. People relate to it because they have experiences. I tried to write about things that New Yorkers experience, and I figured, “What do New Yorkers experience more than Duane Reade?” Right? I thought this might be interesting, it turned out it was. I mapped every single building in New York to the closest pharmacy, and I colored it by the pharmacy.

Orange is Duane Reade, red is CVS, blue is Walgreens, yellow is Rite Aid. First of all, this neighborhood, it’s Duane Reade country. No question. I also learned that CVS and Rite Aid are attacking from the water. Good strategy CVS, come in from the Hudson. Duane Reade will not see you coming. I thought to myself, “Is Duane Reade really a New York thing?” It turns out, no, it’s a Manhattan thing. If we zoom out even more, the Bronx is Rite Aid country. Brooklyn and Queens are patchwork. If you work for one of those companies it’s probably very interesting.

For me, it’s interesting just wondering what our experiences are, and how we can quantify them and tell stories of our lives. That’s something we can all relate to, and that’s part of story telling, even in data analysis I noticed that you want to focus on a single idea. In an improv scene, you can try to have seven ideas going, but things can get lost pretty quickly. In my work I try to focus on one idea, and I went and looked at Citi Bike data.

It is interesting, there are people leaving and coming to stations, there is a lot of data. What if we take one idea, and that idea is gender? Here, I mapped the percent of male and female Citi Bike riders in NYC. What we can see is in this neighborhood, over 80% of the riders are male. This is a very male-dominant Citi Bike neighborhood. What does that tell us? It could be about our transportation infrastructure. It could be a study of gender here in the city. Also, if you are looking to meet a girl on a Citi Bike, go to Brooklyn, that’s important. The important thing here is that it is one idea, it’s just gender. There are many columns, it is a big data set, but let’s study just one. The other thing is keep it simple.

Not just one idea, but one simple idea. Ideas can be very complex and you find yourself in improv thinking, “This guy goes here, and this, and this,” and you’ll lose everybody very quickly. So I also try to keep it simple. So when people hear that I do math they often think that I do this. But it is more like this. I mean, I just count things. I do sum, maybe I do a percentage. This is all just high school math, it is not crazy math. People can really do this if they stop and start to ask the questions. An example: I looked at the percent of parking tickets from out-of-state plates in every precinct in New York.

We see in this neighborhood, there is a higher percent of people coming in from out-of-state and getting tickets. Which is telling; that’s people driving into Midtown more likely, and as we get farther out in our boroughs there are less travelers, which also makes sense. I also wanted to do this per state. So, first, I did New Jersey Midtown, yes, it shows in the data that people from New Jersey are driving into Midtown.

Connecticut, a completely different picture. Coming in from the north. You can actually see it. They are going to the Botanical Gardens in the Bronx. And my absolute favorite, Californians, where do you think they hang out? The hippest parts of Brooklyn: Williamsburg, Bushwick, Green Point, that is where people from California get parking tickets.

ALSO READ:   The Energy of Thought: Penny Zenker at TEDxPSUBehrend (Transcript)

It can tell us so much about our city by looking at our data. Also, explore the things that you know best. You all come from different fields. You know the things that you know. You know the area you study very well, the area you work in. I am learning New York, I have lived here for over a decade, so I focus on New York. In an improv if you are a lawyer, and you go to a scene as a lawyer, that scene will be good, because you know all the vocabulary. You can play it to the very top of your intelligence and just hit it. I tried with other cities, but it is hard because I do not have any context. For today, I did an analysis of Times Square, and I thought, “What can we all relate to?” Maybe catching a cab.

I was curious where people catch cabs around Times Square. That is 8th Avenue over here, you can see that people, in general, are catching cabs at 8th Ave when they are leaving this district. Less so heading south on 7th Ave. You can see that the big yellow blurb in the lower left is the Port Authority, so that makes sense.

But really people are heading towards the avenues. What is interesting here is that is where people catch cabs. Where do you think people get out of cabs? They get out more on cross streets. It is much more of a grid. If you are catching a cab somewhere, especially if you are a tourist, you give the person the address and they bring you there. They’re not dropping you off on the corner and saying, “Good luck, buddy.” That doesn’t happen.

What I really, really liked about this was depending on the direction of the street, people got out of cabs differently. There is 7th Ave: if you come in on 46th, you are going west to east, you seem to get off on the west side of the avenue. Why? You ever get a cab stuck in traffic, you just get out.

You can actually see it in the data; if you come in the other side, you can see people get out on that side of the avenue. Depending on the direction of the street, people are getting out of taxis because they are probably waiting at lights. And this is interesting, right? If you do advertising in the district, you might want to know where to welcome them, where they get out most often. We can start to study this with our public data. With this, you want to try and make an impact.

I’ve tried to make it in City Government, by doing some of this work. Each of you have your own ways you can make impacts. In particular, I did one proof, a mathematical proof, that no matter how many times you ride the subway and refill using their buttons, you cannot get a $000 balance. Literally, it is not just you, or you, or you. You literally cannot get a $000 balance, if you use their buttons. There is a trick, you can type $1905 and get a balance. When I wrote about this, the MTA responded.

I said, alright maybe this is an impact. And they said: “These machines do not hold an infinite amount of change and the denominations are suggested to insure there is ample change to accommodate customers who pay with cash. That being said, we will certainly look at this as part of the process involved in rolling out the next scheduled fare increase slated for next year.” So, the fare increase is coming. Imagine in March, where you say, “I want $20 on a MetroCard,” and you get one. “How much would that be?” And they say, “$1843” And you pay. As opposed to now, “I’m going to pay $20.” “We’ll give you a random amount above that.” Imagine if we switched that, we could run our city better.

We’ll see if the MTA follows through, I’d love to make an impact there. I also found something strange, which is in half of the city cabs, the tipping is based on just the fare and the surcharge. So if you get into a Verifone cab and you hit the 20% button, you are actually paying 20% on top of the taxi fare, and a little bit of a surcharge. If it’s run by Creative Mobile Technology, the other half of the cabs, and you hit the 20% button, you are paying on top of taxes and tolls. So for two different computers, you are paying more tips in one of the computer set ups than the other, because it’s calculating tip on top of tolls.

ALSO READ:   Think Faster, Focus Better and Remember More: Dr. Michael Merzenich (Transcript)

Is this all a big deal? Well, those drivers are making $250 more a year in tips, by this little bit of rounding. We have half our cabs where we are all paying a little more, and the drivers as well, which isn’t a bad thing but it is kind of inequitable. When I pointed this out to the TLC, they said: “We appreciate the work that went into this analysis, and we’re giving it a thorough read.” Impact. I’m working on it.

And my favorite was this: I mapped fire hydrants in New York City. These aren’t just any fire hydrants. These are the fire hydrants mapped by the amount of parking ticket revenue they’re creating. These are the top 250 culprits in NYC. First, on the Upper East Side watch out, the 19th Precinct will ticket you no matter where you park for a hydrant.

More interesting were these two hydrants that were down on the Lower East Side, and they were generating $55,000 a year in tickets. Two hydrants, $55,000 for about 5 or 6 years. Finally, the data is public, I had a look, and when I went to figure out what was happening, it turns out that there is basically a hydrant, then a bike lane, and then a parking spot. So you go thinking you are not in a front of the curb or the hydrant, there is a bike lane between us. It turns out that while the DOT painted a parking spot, the NYPD disagreed.

So they would ticket the spot for years and years. This is actually a shot from the Google StreetView car going by and it caught the ticket, which I really appreciated I wrote about that, and I heard from the city again: “While DOT has not received any complaints about this location, we will review the roadway markings and make any appropriate alterations.” That is an action statement, we are getting better. I thought to myself, “Well Government, I’ll keep trying to make an impact”.

And, suddenly, they repainted the spot. Yeah! Impact! Somebody is listening, this is great. You can look at your data and you can have an impact. Sometimes, your message doesn’t get through, but where I know a statement was made, I know I’ve at least changed some thinking in these agencies, that somebody thinks about those things. I consider those to be making an impact as well.

To do that, once again, I think you really need to think about story telling, like connect with people, try to convey one idea, keep it simple, and explore the things that you know best. In case you think this whole data thing is not for me, that it is for computer people, the open data portal is easy to use. I teach a class in statistics at Pratt, for urban planners, and in our second day of class one of my students turned this, which is a list of accidents in the neighborhood around Pratt, this analysis of injuries by vehicle type. And this is just from one or two classes using Excel, this is not some crazy programming thing. If you are scared of that, this is a data set of graffiti at 311 complaints.

And a student in a program called City Term – which brings students to learn about the city, it is a high school semester program – a student named Abby created this map. So if you are scared of computers, that’s OK, too. Understanding where graffiti is in NYC. It is not about being a computer person, anyone can take data, you just have to know what questions to ask and try to tell your own story. I just hope everyone realizes they can be a data story teller. Thank you.

Multi-Page
Scroll to Top