A Day of Data

If you’re interested in keeping up with my writing, sign up for my newsletter!

I figured that since I work for a privacy company, I should try and increase my own awareness of where my personal data goes. I wanted to try and figure out which companies and organizations have, in one way or another, somehow received and processed and stored my personal information.

Of course, this isn’t an exhaustive list of every single company out there that knows me because that would be an overwhelmingly long list. Not only am I leaving out every third-party ad and analytics software that apps and websites use (I just included Google Analytics and Chartbeat in my findings), I’m also leaving out the impossible task of knowing the data-sharing practices of the companies that already have my data in the first place. (Chase, for example, shares data with their marketing partners to send you advertising.)

I’m not trying to say that sharing your personal data is necessarily bad. Your doctor, for example, needs a ton of data from you like past visits, current medications, family history, insurance plan, pharmacy of choice, payment method, test results, and your contact information. The point of all this is to figure out just how much my life is dependent on sharing my personal data, how necessary it is in my day-to-day life.

So what I have below is a 12-hour diary of my online activities on September 5, 2019.

6am

Xfinity can see everything that goes in and out of the network because they provide my Internet at home. They can see all the connections that my phone was making while I was asleep. They know which apps on my phone connects to the Internet to fetch new information like my messages, podcasts, and the weather in the morning.

6:57am

At around 6:57am, I reply to people on Telegram Messenger. Xfinity knows that I use Telegram—including the time when I used it—because it passes through their network. Telegram, on the other hand, knows my contact list, the people I’m talking to, and when I contacted them.

7:21am

At 7:21, I turn on Headspace to meditate. Again, this is something that goes through Xfinity’s network since the meditations are downloaded from Headspace’s servers as needed. Headspace knows the time when I opened the app, the type of meditation that I played, and if I’m a paid subscriber. It also keeps track of all the meditation courses that I ever played, and the number of days I meditated consecutively.

7:36am

I look at my transactions and my budget every morning on a budgeting app called You Need A Budget. This means that they have a running list of all my personal transactions and financial information: How much I spend on different categories, my spending habits over time, where I spend my money, and even the things that I buy. (I put notes beside some transactions to help me remember what I spent the money on. An example would be: “Amazon.com, pressure cooker.”)

Oh, when I say “where I spend my money,” I also mean that literally. YNAB remembers the location of my phone where I manually entered a transaction to make manual entries easier for me. It was creepy when I first noticed it, but I actually find this super helpful.

Like most web apps, YNAB uses Google Analytics to learn about customer behavior and usage. I’m pretty sure that it’s on the website version of the app (which you can block on your browser), but I don’t know if the mobile app uses it so I may or may not have hit Google Analytics.

YNAB also has access to my accounts on Wells Fargo, Ally Bank, and Vanguard so that I don’t have to manually put in all the transactions that I made by hand.

7:57am

At 7:57am, I leave the house, which means my Internet connection switches from my WiFi to my cellular provider, Ting Mobile (which, in turn, uses T-Mobile towers).

And just like Xfinity, this means that they can now see the connections that the apps on my phone make when I’m on the go.

8:33am

On my way to the train station, I text someone on WhatsApp (hello, Facebook). Just like Telegram, they have information on when I texted, who I’m texting, and my contacts list. Even though they don’t have access to the actual text message I’m sending, they still have all that metadata that they can use to learn about connections with people so that they can show me relevant ads.

8:41am

I take the train at 8:41am using a digital keycard which keeps track of all the train stations and bus stops that I go to. You can purchase these cards and reload them using cash if you’d like to have more privacy, but I linked my transit card to my credit card and e-mail address so that I could get a refund on lost cards and be able to set it to auto-renew at the end of each month.

Note: The image above is a bit misleading. I put SEPTA beside Ting Mobile, but Ting doesn’t actually get any direct information on my commuting habits.

9:04am

I go to a popular convenience store called Wawa before heading into my co-working space, and I buy a breakfast burrito using my credit card. Wawa knows that my method of payment is Apple Pay and that payment network is Mastercard, but I’m a bit unclear if they know about the issuing bank (Goldman Sachs).

Update: Gosha sent me an e-mail about the BIN or Bank Identification Number on your credit card. The first 6 numbers on your card will tell you the issuing bank.

What I am sure of though is that Goldman Sachs knows that I went to Wawa, what time I went to Wawa, the location of that Wawa, and how much I paid for that burrito at Wawa.

I also budget on YNAB on my phone while I’m in line for my burrito, so YNAB now knows the same details that Goldman Sachs does. Ting is also now aware that I use YNAB since I needed the Internet to access my budget.

9:08am

I send a text on Telegram as I walk to my co-working space, so Ting also knows that I use that app, too.

9:54am

I connect to the WiFi in the co-working space, which is provided by a local ISP called Philly Wisper. I then log on to the websites that I need for work like Asana, the company chat room, and the company calendar on Fastmail.

Generating digital data is, of course, unavoidable when I’m doing remote work since collaborating, planning, and updating work with my colleagues mostly happen on the Internet.

10:56am

At 10:56am, I play some music on Spotify. This particular day was mostly songs from Tyler the Creator, Tierra Whack, Solange, and The Internet. Spotify, of course, knows my listening habits which they use to generate individualized playlists that I might like. Spotify also has business partners that can place cookies on your device whenever you’re on Spotify. This is used to “deliver advertisements more relevant to you and your interests.”

I also start visiting other websites to find inspiration for data visualization for this very article. The websites that I visit, just like YNAB, use analytics software like Google Analytics and Chartbeat to learn about their readers.

12:02pm

At around lunchtime, I call CVS Pharmacy and ask them to get my prescriptions from Walgreens Pharmacy. So now, CVS knows all the medications that I take, who my doctor is, and who my insurance provider is.

Also, that call went through Ting Mobile, who now knows that I made a call to a nearby CVS at 12:02pm for 6 minutes.

12:14pm

I check the Dark Sky app on my phone to see if it’s going to rain this afternoon. Since it’s a location-specific app, it knows the city that I’m currently in.

12:28pm

I walk outside and get a flu shot at CVS (the same CVS branch that now has my medications). I got the vaccine for free because my insurance provider covers the full cost of it. So now both CVS and my insurance provider know that I just got a flu shot at this location and at this time.

1:46pm

At 1:46pm, I get charged for a recurring monthly bill from my therapist’s office at Thriveworks. They see how often I come in for therapy, who my therapist is, my insurance details (Independence Blue Cross), and my payment details (Mastercard).

2:48pm

I use an app called Transit to see what time the bus will come. This app keeps track of where I am (crowdsourced and real-time data is important to them), the possible routes that I can take, where I’m going, and a list of buses and trains that I frequent. Fortunately, they don’t really know who you are unless you send them feedback.

I then hop on the bus using my SEPTA keycard, so the transit agency knows the time I hopped on, where I was waiting, and the bus number. If you noticed, I only mention SEPTA having data at the start of the trip and not the end when I hop off the train or bus. This is because they don’t scan your keycard when you exit the bus or train.

3:23pm

I got home, prepared a late lunch, and watched some videos on YouTube. YouTube (which is owned by Google) knows the types of videos that I watch to improve their advertising and to make better video recommendations. After all, the more videos I watch, the longer I stay on the site.

4:03pm

My last entry of the day is getting a notification for a recurring donation (what a mouthful) to a charity called GiveWell. Just like Wawa, they know my credit card number and that I used Mastercard. (Again, I’m unclear if they know about the issuing bank part which is Goldman Sachs.)

Goldman Sachs and Mastercard, on the other hand, know where I donate my money and how much I give away.

The charity also knows my contact details like my address, phone number, and my e-mail address.

Pfew.

Visually laying out all the entities that have their hands on my data is a bit mind-boggling, and I’m sure that I’m missing a couple hidden services that run in the background that I’m unaware of. What scares me the most is how often data breaches seem to happen and how reliant I am on all these digital services.

There are some alternatives out there that I can take, but none of them seem to be worth pursuing. I could start paying in cash so that banks and payment processors will have limited information on my whereabouts, but I find it too much of a hassle to carry cash and change these days. I could buy music instead of streaming on Spotify, but then I lose having access to virtually any song that I want for cheap. I could dump my smartphone for a less-capable phone to limit the apps that follow me wherever I go, but then I lose the convenience of having transit data, meditation courses, the weather, and my budget at the palm of my hands.

It looks bleak, but thankfully our values are always changing. People are waking up and realizing that digital privacy in this software-filled world is important, which means that people are starting to look for services that offer privacy. Goldman Sachs, for example, has been a recurring company in this log that I made, but at least they can’t share or sell the transactions that they have on me. DuckDuckGo doesn’t know what you’re searching and it blocks online trackers (like Google Analytics and Chartbeat) from following you around the web. Non-profits like Mozilla are creating a suite of privacy tools to help you understand and protect your privacy online.

Things seem to be going into the right direction when it comes to our digital wellbeing. I hope you enjoyed this visualization of my digital data diary, and I hope we both see a future with a little more privacy.

Where I spent my time during a really long flight

If you’re interested in keeping up with my writing, sign up for my newsletter!

It was my fault, really. I don’t like flying, and I don’t sleep well in planes, but I still booked a 28-hour flight from Manila to New York. I honestly thought I was ready for it. I had melatonin pills, a neck pillow, a sleep mask, a blanket, and a pair of socks. Knocking myself out was my number one goal. That, I thought, was how I would survive this flight. While I did manage to fall asleep, I kept on waking up because some part of me would start to get sore from being confined in a chair for so long.

The first part of the round trip flight was actually a pretty good deal. For $800, I was able to go to Germany (stayed for 12 hours) and to Singapore (stayed for 5 days) before finally flying out to the Philippines. I got to visit countries that I’ve never been to before, and I also got to spend time with friends that I haven’t seen in a long time.

The coming back part, unfortunately, meant that I have to go from Manila to Singapore, Singapore to Germany, then Germany to New York. It was an unnecessarily long trip, and because I kept on waking up, I thought I’d do something fun instead. Like maybe log the activities that I did in the flight.

It’s unfortunately not the most accurate data because some activities overlap like eating while watching a movie. I also didn’t have the foresight to jot down the time when I would stop doing something, e.g., I would log the time when I started eating but not when I finished. But even then, I think the logs were still good enough to learn from.

I didn’t know how to represent the data at first. I wanted something more creative and outside of the usual charts that I’ve made before. But the other day, I ran into a Sketch plugin that created spirals. So I thought that maybe I could represent the data as a snail because of how unbearably long the flight was.

So after a couple hours of trial and error with both the 6Spiral and Looper plugins, I came up with a shell where each spiral represented 30 minutes of activity. Behold, my snail collection!

Since I didn’t log (more like forgot to log) absolutely everything, there is a “miscellaneous” category in there that includes a hodgepodge of things like going to the bathroom, daydreaming, listening to a podcast, brushing my teeth, and changing my clothes.

Looking at the visuals, I’m kind of surprised that I logged 9 hours of sleep. It wasn’t the restorative kind of sleep for sure, but it was still a substantial amount of time. I’m guessing a good part of that was spent tossing and turning in the seat.

Another surprising part is how small the music-listening and watching category are. I went through several albums and TV shows on the plane, but I guess that didn’t really amount to much. Or it could be that I was too distracted to log. We’ll never know.

I’m thinking in the future I could look into animating each spiral similar to this, so I’ll look into that when I have the time. Anyway, I wish you never have to take such a long flight in economy class!

The Sketch file can be found here.

Visualizing data on global warming

If you’re interested in keeping up with my writing, sign up for my newsletter!

I was jokingly telling my partner that the best way to help prevent climate change is to just be dead because using water (processing drinking water and waste), eating (farm equipment and goat burps), and using electronics (outside of child labor, manufacturing them and connecting to data centers) all have carbon emissions.

So how much CO2 do we generate? The world is a big place, so I wanted to look at the data in the US. It turns out that the US alone pumped out 6,870 million metric tons of C02 in 2014 which is so much CO2 that you’d need 8 billion acres of forest or 1/5th of all the land on the Earth to sequester it.

I wanted drill down and visualize CO2 generation between different states, and thankfully there’s data on that from the EPA. Here’s the amount of CO2 production per state on a map in 2016:

Here’s another way to visualize this. (Inspired by the number of robocalls that annoyed Americans last year.)

If we stack the emissions of some of the biggest CO2-belching states on top of each other, we start to see why the US generates so much C02.

So what is producing all of these carbon? If we zoom in to a state like Pennsylvania, we can see that a large part of it comes from generating electric power and from transportation. The New York Times digs deeper into how each state generates electrical power piece and they show that Pennsylvania’s power is run by 22% coal and 34% natural gas.

With regards to the transportation piece, the EPA says that the “largest sources of transportation-related greenhouse gas emissions include passenger cars and light-duty trucks, including sport utility vehicles, pickup trucks, and minivans.”

Of course the ratios aren’t always the same with every state. We can look into a different state like South Dakota which has smaller C02 emissions because it only uses 19% coal and 6% natural gas in the electric power sector. A lot of their energy now comes from 48% hydroelectric and 27% wind.

Speaking of wind, I found some good data on wind energy, too. I had recently switched my electricity supplier to The Energy Co-op, and they had pointed me to USGS because I had asked them where the power is generated. It turns out they let you download all the locations of the turbines in the US and the power that they generate.

So I downloaded that and plotted all 59,338 turbines that were listed in there! I didn’t know there were that many to begin with.

Actually, we can take it up a notch and show all wind, solar, nuclear, hydroelectric, and geothermal power plants on the map using this data from the EIA.

This makes me hopeful for the future of renewables and data definitely shows that it’s growing, but right now that growth is being negated by the reduction of nuclear energy (a low-carbon energy source) so the net share of low-carbon electricity production is unfortunately the same as it was a decade ago.

So basically the Earth is still heating up.

What now?

I’ve also been looking into what I could do, and from what I’ve been reading, it looks like policy changes and regulations seem like it’s going to be our best bet to fight climate change.

In the end, though, experts do not believe the needed transformation in the energy system can happen without strong state and national policies. So speaking up and exercising your rights as a citizen matters as much as anything else you can do.

What is still largely missing in all this are the voices of ordinary citizens. Because politicians have a hard time thinking beyond the next election, they tend to tackle hard problems only when the public rises up and demands it.

The New York Times

To give an example, Britain protesters are urging politicians to declare a climate emergency, and it seems that the government is starting to listen.

Here in the US we can voice our opinions for politicians to push for policies like keeping the US in the Paris Agreement and the Green New Deal. Local policies are also important especially in states like Pennsylvania which generates and exports a lot of electricity.

I’ve personally been using 5calls, GovTrack, and ResistBot to contact state officials and congress, and I’ve started to donate 1% of my monthly salary to organizations that advocate for policy change ($1,080 per year):

Another smaller, but no doubt important, is through personal changes:

In Philadelphia, I learned that I could also choose an electrical power provider through www.papowerswitch.com so that I’m buying wind and solar power using my electrical bill. Philly also has access to composting through Bennett Compost and Circle Compost.


Source code: https://observablehq.com/@jagtalon/climate-change

Redesigning a form

If you’re interested in keeping up with my writing, sign up for my newsletter!

One of the projects that I’ve worked on recently is the redesign of a fairly simple form. All it did was gather feedback from people who wanted to report a broken website from within the DuckDuckGo browser extension.

So what you would do was that you’d click on the “Report Broken Website” button on the extension, and then a new tab would open with a form that you would then fill out.

Forms aren’t all that exciting to be honest, but there’s room for creativity in designing them. Just take a look at the thoughtfulness put into Stripe’s payment checkout form or the range of ideas that you can find on Dribbble.

So with this project the main issue that we wanted to solve was friction: Can we increase the number of people reporting broken websites if we made it easier to do so?

Understanding the problem

People already had ideas on possible solutions and user flows before I even started on the project. So my job at the very beginning was to listen and understand the scope of the problem and the goals of the project.

For me that often meant grabbing a bunch of paper from my co-working space’s recycling bin to write down design ideas and to sketch out some possible solutions.

A part of the design process for me is also figuring out the copy and the tone and voice that we want in the UI. We figured that in this case we’d want to show empathy to the person reporting (because it sucks to run into a broken website) and to communicate that we’re not going to be harvesting all their personal information behind the scenes (because we don’t).

After I felt like I had a good handle on what I needed to do, it was time to go on Sketch.

Iterating on the designs

Forms are really just a bunch of input fields. Figuring out what sorts of fields to put in there, however, can take some time to figure out. The first set of designs that I made had multiple elements:

  • A select element for indicating the type of breakage.
  • A textarea element for people describing their problem.
  • A checkbox element for people to opt-in to debugging data.
A screenshot of Sketch showing initial work on the form.

But after a few discussions and a couple of design variations later, we figured that we could reduce the form down to just one element. First, we got rid of the checkbox because the submit button is how people opted-in to sharing their data. (In lieu of that, we added additional information that we weren’t going to get any of their personal information.) Second, we figured that we already had the debugging information so there was no need for people to explain what went wrong.

This greatly simplified the form, and made it faster for people to report a broken website.

A second set of variations that have fewer elements in them.

Another part of the design that I had to look into was how people got into the form itself. What most some people do when a website is broken is that they toggle the extension on and off to see what would happen. We wanted to reach out to these people by asking them if they were toggling because they found a broken website.

Some ideas showing notifications above and below the extension popup.

The initial designs that I made used modals that imitated how smartphones would show notifications. I loved this idea because this sort of notifications could be used to help and guide the user when they encounter different problems throughout the extension. But we agreed that right now it’s a bit of an overkill solution so we went with a simpler inline version that comes up right below the toggle.

A simpler solution that was easier to implement and design.

Coming together

Now that the main parts of the project have been designed, it was time to create a user flow that shows the different scenarios that a user could get into when trying to open up this form. This is great for communicating design to developers and other stakeholders so that everybody is on the same page.

A user flow showing scenarios in the UI like error and success screens.

Then we also asked Matt Anderson to create an illustration for the form and we all ended up liking this broken bike design! At this point, it was time for the developer to implement the designs.

All these changes are now live! So if you ever run into a broken website on the DuckDuckGo extension (I hope not!), you now know how that reporting page came to be.

Building a tiny device lab

If you’re interested in keeping up with my writing, sign up for my newsletter!

I do most of my testing on platforms like BrowserStack. It works super well—having access to a ton of devices and browsers have been useful in finding bugs and ironing out some of the kinks that inevitably come up. Unfortunately, I’ve found it to be a bit lacking when I’m trying to test the “feel” of a website: if the animations and interactions are smooth, if interacting with the app feels natural, and if the buttons are too small for my stubby fingers.

While I obviously have my own personal devices to test on, they’re not enough. Not everyone has the latest and fastest phone, and not everyone uses the same browser. Ethan Marcotte explains this well:

Newer devices are often the best-represented among the teams I work with, which means they usually have, well, the best representation in the design process. But it’s the older ones that need our time and attention: because even though these devices are of a vanishing breed, if not an outright obsolete one, they can help us understand how our design decisions will impact devices that don’t look like ours. How they’ll impact users who aren’t quite like us.

Mozilla goes into the details:

Many developers believe the browser they use is the only browser that anyone really uses, therefore they should just develop for it. By some measures, 70% of web developers use Chrome on the desktop. But only about 50% of web traffic across all device types is on Chrome, and only about 62% of web traffic on the desktop is on Chrome. Building and testing only on Chrome alone ignores almost half of global users. (It’s worth pointing out here that different browser share trackers use different methodologies and produce different numbers, and the numbers change quickly and often.)

And browser use varies by geography. Chrome, Firefox and IE/Edge are the top browsers in many locales, but the proportion of users on each varies. German users favor Firefox over Chrome. IE is big in Japan. Quite a few Australians choose Safari. More than 1 in 5 Vietnamese users run a fork of Chromium called Cốc Cốc. Building and testing on just one browser ignores these market differences.

I looked into device labs online and saw how intimidating Perth Device Labs and Helsinki Device Lab were. I thought about not going through with it at first because I definitely didn’t have the money to buy that many devices. But then I thought that I could start small and cheap by asking for device donations from people that I knew.

I started off with my girlfriend’s old LG G5. It had some charging problems and a roughed up screen, but I thought it was good enough! It was fortunately fairly straightforward to repair it. (It was straightforward, but it wasn’t easy! Prying it open was a constant reminder of how easy it is to break these things.)

Then I got my mom’s old iPhone 6S. And then finally an iPhone SE for $80 from a second-hand electronics website called Swappa. Definitely a steal considering Apple is currently selling refurbished iPhone SEs for $249.

I installed a ton of web browsers on each of them, and also registered a separate Gmail account in the App and Play stores just in case someone wanted to borrow it. So far it’s been useful. I’ve not only tested websites, but also apps.

I’m not close to being in the lab stage yet—and I’m far from running into problems that Etsy’s device lab has—but I think it’s a start. Another plus for buying second-hand, fixing, and asking for device donations is that we’re also puts less strain on the environment:

The little computer you carry with you requires a lot of energy to assemble. The production of an iPhone 6, for example, released the equivalent of 178 pounds of carbon dioxide, or about as much as burning nine gallons of gas, according to a 2015 study. Instead of buying a new phone, try to keep yours in working condition for as long as possible (here’s some advice on how to extend its life). But if you must get rid of yours, recycle it or consider buying a used one.

Posted in Web

Where are the bikers?

If you’re interested in keeping up with my writing, sign up for my newsletter!

I briefly visualized some of the data that I got from Indego in my last blog post, and we saw that most people used Indego for work: It spikes at 8am and then again at 5pm. Basically it showed the start of an average work day.

What was also interesting was that activity basically died down during the weekend—something that I didn’t really expect to happen! I really thought that Indego would also spike on the weekends, but what we see is more uniform. It’s still a sizable amount of people, but it makes sense that it’s not bunched up to a specific hour.

But it made sense: you don’t have to worry about theft, flat tires, maintenance, or even just carrying it up your apartment. It was easy to find because you can find bike docks all over the city.

So in this round of data spelunking, I wondered: Which neighborhoods have the most bike borrowers in Philadelphia? And where can you find these bikes?

Based on the GeoJSON data at Indego, there are 130 active stations available in the city, and I plotted them on the map along with neighborhoods where people borrow bikes the most. It turns out that the areas University City, Logan Square, Rittenhouse, and Washington Square West are the ones that see the biggest ridership.

This visualization was created using the data from the fourth quarter of 2018 which had 65,535 bike trips that were recorded from October to December. Here’s a map of the whole Philadelphia county for a more zoomed out view:

It’s interesting to see a large number of people borrow bikes in Center City because the area is already packed with cars. It makes sense that people are seeking alternative ways to get to work.

The visualization was created using neighborhood data from Azavea, street data from the City of Philadelphia, and of course data from Indego. I started out by simply displaying the polygons of the neighborhoods:

Next, I wanted to color the polygons that had the bike dock coordinates. Since the biking data is huge, I thought it would be easier and faster to use a smaller, more familiar set from my personal SEPTA trips.

Figuring out if a coordinate fell into a polygon was thankfully straightforward through D3-Geo by calling d3.geoContains. I wrote a small loop that went through each neighborhood polygon and each station coordinate:

Running it with the SEPTA information was easy, but it took around 10 minutes when I switched to the biking data. If I’m going to do this with an even larger data set (like Citibike in NYC), I’m going to have to figure out a way to make it run faster.

After generating the table of data containing the number of bikers borrowing in a certain neighborhood polygon, I combined that with the map data (with a lot of help from existing examples) I got to make this:

I then added the points of all the Indego bike docks to show where they are on the map.

Then finally adding another layer that draws the streets based on geodata from the City of Philadelphia to add some context on which streets these bikes are located at. Also it looks damn pretty.

One thing to note is that you’ll notice that I used the ending latitude and longitude (the dock where they ended up after the trip) to create this map, and it actually surprised me that using the starting coordinates would generate the same map! It turns out that there’s little difference between where people start and where people end because they—I’m assuming—will eventually bike back to where they came from.

I think there are more interesting ways to visualize this data, but I didn’t have much time to do additional explorations. Stay tuned for next time!

What are your commuting habits?

I recently did a lunch and learn at my co-working space, and these are the slides that I used. This post is a little different from the content in the slides, but the gist is the same. Hope you enjoy reading it! If you’re interested in keeping up with my writing, sign up for my newsletter!

I’ve seen a lot of wonderful data visualizations on websites like Flowing Data or The Pudding, but I never had the desire to do my own visualizations. The data seemed daunting and inaccessible—I mean, where do I even get that kind of data? How do I even make sense of those big data sets? So I put any hopes and dreams of me doing interesting visualizations in the stuff-I’ll-never-get-to-do bucket.

But then I ran into a book called Dear Data. It was a year-long project by two women who each drew their personal data, and sent them across the Atlantic. Here’s a video that describes the project well:

I was inspired—it didn’t occur to me that I could use my own personal data instead of downloading some big data set out there for me to visualize. I’ve coincidentally been interested in public transit recently, and I thought that visualizing my commuting habits would be a good start.

Gathering data

I happened to know that most of the trips that I’ve made are online because I use a digital key card to get to all modes of transportation here in Philadelphia. That covers trains, buses, and trolleys.

A screenshot of septakey.org which shows the trip history for the digital key that I use.

The hardest part was exporting this data into something that I could play around with. So I had to copy paste each row into an Excel sheet so that I could start playing with it as a CSV. It was such an arduous task that if I were to do this again in the future, I would use a tool like Puppeteer to scrape the data for me instead.

A screenshot showing how you can request your ride data on Lyft. It also shows the e-mail that you receive after you request the data.

Getting my data from Lyft was thankfully easy. All I had to do was to go into the app and export the rides into a CSV which you get as an e-mail attachment.

Now what?

I was a bit stuck after I gathered all the data that I needed because I wasn’t sure how to create the visualizations that I wanted.

I vaguely knew things like Observable and D3, but the examples looked pretty daunting especially since I didn’t know how to create SVGs from scratch. Fortunately, I ran into Vega-Lite which made visualizations a little bit easier because you didn’t have to hand-write the SVG graphs.

It took a bit of trial and error before I got the hang of it, but the first thing that I was able to make was a scatter plot showing all the train stations and bus stops that I’ve been on in the last 9 months. In there, you can clearly see that I have been going to Girard Station – MFL most often since that’s where I live, but also 2nd St Station – MFL because that’s where I go to work.

A scatter plot showing each trip that I made and the station that I was in.

Compressing that scatter plot to show just the modes of transportation, you’ll see when I only started using the bus and the trolley in October and November of 2018. I used to be very confused with the bus, but apps like Citymapper and Transit App have made it a lot more accessible for me.

A plot showing the modes of transportation that I’ve been taking each month.

One thing that I really wanted to know was when I took public transit, and thankfully Vega-lite makes this easy. They even have an example of it!

The result was pretty, but a bit disappointing to see how random my trips are. But there is some insight in there: It looks like I don’t travel much on Tuesdays or Thursdays, but I travel a lot on Friday, Saturday, and Sunday to hang out with people and do chores.

A punch card showing when I commute and travel around Philly.

Lyft Data

Another thing that I wanted to see was how my transit expenses have changed over the last couple of months. I used to be an avid Lyft user because of its convenience, but it’s really hurt my wallet in the past. So I wanted to compare that data with my public transit data.

What came out was honestly pretty disgusting. I spent so much on Lyft in August and September that it hurts to look at it. This was pretty much my braking point and why I’ve been taking public transit more. In October I said to myself that I wouldn’t spend that much money just to go around a city.

A graph showing the cost of taking public transit vs. Lyft.

What’s also interesting is that I’m moving around more than ever. I graphed the number of rides I’m taking (that is, how often I commute), and I’m at an all-time high—all without the associated costs.

A graph showing the number of trips I’ve been taking on public transit vs. Lyft.

You can see that I still take some Lyft rides, but these days I only use them when I’m in a hurry or if I’m carrying huge bags of groceries. There’s research that says that ride-sharing and public transit are complimentary. I agree.

Mapping

Wouldn’t it be cool to see all of my trips on a map? Now I don’t know much about mapping, but Leaflet seemed like a good place to start so I read up on that. Unfortunately, I had to map the stations to actual lat-lng coordinates that I found on Google Maps. It was tedious work, but I did manage to get a heat map working.

In the heat map below, you’ll see that I’m generally in three locations: home, work, or center city. No surprises there.

A map that I created using Leaflet and a plugin called Leaflet.heat.

Zooming in closer shows the specific stations that I take. Everything looks right aside from the fact that 2nd St. Station is missing so I might have made a mistake on the coordinates there.

A zoomed in map showing the individual bus stops and stations that I’ve been to.

Bonus Round

So in Philadelphia there’s another mode of transportation that I haven’t talked about: bikes! I’m a bit too scared to ride the bike in the city (for now), but the City of Philadelphia publishes the data on all the bike trips made every quarter.

A bunch of people on their Indego bikes.

So I took that data and looked at what would happen if I simply plugged it in to my existing graphs and maps.

The heat map generated is interesting. With a few tweaks it shows that a lot of the trips (at least in the first quarter of 2018) are concentrated in the center of Philly. There are some blips in University City on the left of the blob, and some in the museum area on the upper left corner of the blob.

A heat map generated from all the Indego bikers in the first quarter of 2018.

I also wanted to know when people borrowed the bikes. If you asked me, I would’ve assumed that people used the bikes more on the weekends or for leisure. But when I ran the data, it clearly shows that people use it mostly for work. You can clearly see the 9am and 5pm crowd, and you also see it dying down on the weekends.

It was interesting because I assumed that people who biked to work owned their bikes. But at $17 a month, it looks like Indego is a good deal for people not wanting to pay upfront for a bike, do maintenance on it, and worry about it getting stolen.

A punch card chart of everyone’s bike trips in the first quarter of 2018.

Source code