I recently did a lunch and learn at my co-working space, and these are the slides that I used. This post is a little different from the content in the slides, but the gist is the same. Hope you enjoy reading it!
I’ve seen a lot of wonderful data visualizations on websites like Flowing Data or The Pudding, but I never had the desire to do my own visualizations. The data seemed daunting and inaccessible—I mean, where do I even get that kind of data? How do I even make sense of those big data sets? So I put any hopes and dreams of me doing interesting visualizations in the stuff-I’ll-never-get-to-do bucket.
But then I ran into a book called Dear Data. It was a year-long project by two women who each drew their personal data, and sent them across the Atlantic. Here’s a video that describes the project well:
I was inspired—it didn’t occur to me that I could use my own personal data instead of downloading some big data set out there for me to visualize. I’ve coincidentally been interested in public transit recently, and I thought that visualizing my commuting habits would be a good start.
I happened to know that most of the trips that I’ve made are online because I use a digital key card to get to all modes of transportation here in Philadelphia. That covers trains, buses, and trolleys.
The hardest part was exporting this data into something that I could play around with. So I had to copy paste each row into an Excel sheet so that I could start playing with it as a CSV. It was such an arduous task that if I were to do this again in the future, I would use a tool like Puppeteer to scrape the data for me instead.
Getting my data from Lyft was thankfully easy. All I had to do was to go into the app and export the rides into a CSV which you get as an e-mail attachment.
I was a bit stuck after I gathered all the data that I needed because I wasn’t sure how to create the visualizations that I wanted.
I vaguely knew things like Observable and D3, but the examples looked pretty daunting especially since I didn’t know how to create SVGs from scratch. Fortunately, I ran into Vega-Lite which made visualizations a little bit easier because you didn’t have to hand-write the SVG graphs.
It took a bit of trial and error before I got the hang of it, but the first thing that I was able to make was a scatter plot showing all the train stations and bus stops that I’ve been on in the last 9 months. In there, you can clearly see that I have been going to Girard Station – MFL most often since that’s where I live, but also 2nd St Station – MFL because that’s where I go to work.
Compressing that scatter plot to show just the modes of transportation, you’ll see when I only started using the bus and the trolley in October and November of 2018. I used to be very confused with the bus, but apps like Citymapper and Transit App have made it a lot more accessible for me.
One thing that I really wanted to know was when I took public transit, and thankfully Vega-lite makes this easy. They even have an example of it!
The result was pretty, but a bit disappointing to see how random my trips are. But there is some insight in there: It looks like I don’t travel much on Tuesdays or Thursdays, but I travel a lot on Friday, Saturday, and Sunday to hang out with people and do chores.
Another thing that I wanted to see was how my transit expenses have changed over the last couple of months. I used to be an avid Lyft user because of its convenience, but it’s really hurt my wallet in the past. So I wanted to compare that data with my public transit data.
What came out was honestly pretty disgusting. I spent so much on Lyft in August and September that it hurts to look at it. This was pretty much my braking point and why I’ve been taking public transit more. In October I said to myself that I wouldn’t spend that much money just to go around a city.
What’s also interesting is that I’m moving around more than ever. I graphed the number of rides I’m taking (that is, how often I commute), and I’m at an all-time high—all without the associated costs.
You can see that I still take some Lyft rides, but these days I only use them when I’m in a hurry or if I’m carrying huge bags of groceries. There’s research that says that ride-sharing and public transit are complimentary. I agree.
Wouldn’t it be cool to see all of my trips on a map? Now I don’t know much about mapping, but Leaflet seemed like a good place to start so I read up on that. Unfortunately, I had to map the stations to actual lat-lng coordinates that I found on Google Maps. It was tedious work, but I did manage to get a heat map working.
In the heat map below, you’ll see that I’m generally in three locations: home, work, or center city. No surprises there.
Zooming in closer shows the specific stations that I take. Everything looks right aside from the fact that 2nd St. Station is missing so I might have made a mistake on the coordinates there.
So in Philadelphia there’s another mode of transportation that I haven’t talked about: bikes! I’m a bit too scared to ride the bike in the city (for now), but the City of Philadelphia publishes the data on all the bike trips made every quarter.
So I took that data and looked at what would happen if I simply plugged it in to my existing graphs and maps.
The heat map generated is interesting. With a few tweaks it shows that a lot of the trips (at least in the first quarter of 2018) are concentrated in the center of Philly. There are some blips in University City on the left of the blob, and some in the museum area on the upper left corner of the blob.
I also wanted to know when people borrowed the bikes. If you asked me, I would’ve assumed that people used the bikes more on the weekends or for leisure. But when I ran the data, it clearly shows that people use it mostly for work. You can clearly see the 9am and 5pm crowd, and you also see it dying down on the weekends.
It was interesting because I assumed that people who biked to work owned their bikes. But at $17 a month, it looks like Indego is a good deal for people not wanting to pay upfront for a bike, do maintenance on it, and worry about it getting stolen.