Thank you for 18 years of DVDs, Netflix
Analyzing my family's DVD rental history in honor of Netflix sunsetting the service
In addition to writing about my experiences and lessons learned at an early-stage startup, I also like to do data analyses for fun sometimes! I usually investigate some data I’ve collected about my personal life. If you’re interested in seeing the code and data detours I took in writing this post, check out the full notebook on my website. If you just want the story, read on!
Soon, Netflix will be canceling its DVD-by-mail program, the original service that helped Netflix crush Blockbuster and got us used to watching movies on-demand from the comfort of our homes before streaming was a thing. Perhaps not coincidentally, my dad cancelled my family's subscription to the DVD service this winter. As my brother wisely put it upon hearing my dad’s news, "Netflix can finally stop buying physical DVDs now that their last customer cancelled!"
Of course, when I saw that Netflix had kept my parents' entire DVD history I knew I had to look at the data. According to the history, my family signed up for Netflix in 2004 - that's almost 20 years of DVDs! For most of that time, we were on a plan that let us have 3 DVDs concurrently. At some point while I was in high school, I was given full control over one of the 3 DVDs on our plan. Looking at the DVD history, though, this must have been before Netflix even had the concept of separate accounts - my Netflix account on my family’s plan only shows ~10 DVD rentals, but I distinctly remember years of freedom to discover indie movies and curl up in our game room watching movies on my own in high school. It was such a treat to have my own stream of movies that I had full control over. In fact, I still really miss watching indie movies and discovering other excellent movies from their trailers - Netflix's algorithm really hasn't figured me out as well as those trailers had.
Getting the data
Anyway, onto the data. When you log in to dvd.netflix.com (a separate website from netlix.com, lol), the history is very simply shown in a table.
I didn't see any easy way to just export this table, and I didn't really want to try too hard to find a legit way to scrape the site (especially since I figured there'd be complex auth to get around), so I started with the good ol' "inspect page" method. (Actually, I started with copy-paste but that didn't work.) Turns out the information was easily accessible in the html itself, so I went ahead and just downloaded the html pages for my mom and dad's account histories. My parents started each using their own account to rent DVD's (once the concept of accounts was implemented) and my dad's account had about 500 entries on it so I figured it might have different movies on it.
With a combination of BeautifulSoup's documentation, poking around via Chrome's inspect tool, and good ol' ctrl-F, I was able to pretty easily figure out how to extract all the information I needed.
I’ll spare you the code to extract the information here, but you can check it out on my blog if you’re interested.
After I extracted the information from the html page, I got a table of data that looks like this:
Cleaning the data
First, I wanted to ask some basic questions of this dataset: how many movies did we rent, and did we rent any of the same movie twice?
Of course, these questions led me on a very long data wrangling detour which led me to discover:
The history I downloaded from my dad's account is a subset of the history in my mom's. In other words, every movie in my mom’s history is also in my dad’s history.
But the movies in my mom’s and dad’s histories don’t always have the same user rating, which makes sense, or average rating, which doesn’t. I’m assuming the average rating is Netflix’s average rating, so I have no idea why they would differ based on whose history I was looking at. So it seems that doing any sort of analysis on the ratings might be a bit more complicated. But that’s ok, I wasn’t planning on digging into that anyway.
In the data, TV shows have their season name (e.g. “Season 1”) in the field where movies have their year, and a Disc number (e.g. “Disc 2”) where movies have their duration. If I wanted to do an analysis of movies vs. TV shows, I could use this to infer the category of each rental.
Two movies (The Big Sick and The Harvey Girl) have the exact same ship date but different return dates. I'm assuming that's a mistake in Netflix's data, unless my parents and I both rented the same DVD on the exact same day and returned them both exactly three days apart from each other. But given that I've still never seen The Big Sick and don't know what The Harvey Girls is, I'm betting on dirty data.
18 years of rentals
After cleaning the data, I could finally start on some analysis. First off - how long did we use the service and how many unique rentals did we make?
Wow, we were signed up for Netflix's DVD service for over 18 years! That's pretty amazing, and probably outlasts every commitment my parents made apart from maybe their longest jobs and homes and, oh right, kids.
In that time, we made a total of 1596 unique rentals. Of these, 1441 were for movies we rented only once. We rented 76 movies twice, and one movie three times. Let's see what the lucky movie was!
Looks like it's Before Sunset, which makes sense - it came out in 2004, and was probably a movie that my parents and I both rented separately while I was in high school, and then that I guess my parents re-watched in 2019.
1596 rentals over 18 years is a little over 88 movies per year, which is about 1.5 movies per week for 18 years. That’s pretty impressive.
Let's see if we can visualize this data nicely. I'll use the return date as a proxy for when the movie was watched, since we were usually pretty prompt about returning the movies after watching them. After a quick Google, I found a nice and simple Python library that makes Github-style calendar plots, nice!
Daily rental patterns
First off, a quick guide to reading this sort of plot. Each subplot represents one year and has 365 boxes, where each box is a day. Each row is a day of the week and each column is a week. The boxes are colored by how many DVDs were returned on that day (darker means more DVDs). So if you see a row with lots of filled-in boxes like a horizontal line, that means that we returned DVDs on the same day of the week across multiple weeks. Let's say it's the second row from the top, that would mean that Tuesdays are a frequent return day. Seeing a column of filled-in boxes would mean that we returned DVDs every day on a given week.
Ok, now that we're oriented we can start to pick out some patterns. The first sort of pattern that sticks out to me is about which days we returned the DVDs. First, it looks like we rarely returned movies on two different days per week - you can see that there are very few columns with two filled-in boxes. Second, we never return DVDs on Saturday or Sunday (there are no filled-in boxes in the bottom two rows). It would make sense for Netflix's DVD receiving department to be closed on weekends, so that checks out. Finally, it looks like our most frequent return day is Tuesday -- that makes perfect sense! My parents tend to watch movies over the weekend, which means they would get picked up by USPS on Monday and received by Netflix the following Tuesday.
Another observation is that we would sometimes go months without returning any DVDs - you can see this as areas where there are multiple columns in a row of empty boxes. My guess is that these likely correspond to periods when my parents were on vacation, out of town, or otherwise busy. You can see examples of these gaps in June and July of a handful of years, which is what tipped me off to this vacation hypothesis. But there's a lot of gaps, so I don't think I'll ask them to corroborate this hypothesis with the most recent dates of their big RV trips.
Return day consistency seems informative
Finally, the consistency of which days of the week we returned DVDs is intriguing - there are some years where it's really consistent (the filled-in boxes are all on the same two-ish rows) and others where it's not. From just looking at the plot, it seems that 2005-2009 didn't have super consistent return days of the week - that makes sense, this was the period where I lived at home and had my own dedicated DVD (before the days of password sharing, this is how we shared accounts!). My guess is that I either watched movies on weekdays sometimes or, more likely, was less prompt at returning them after I watched them, which would explain the variety of return days.
The following 5 years, 2020-2015, had a much more consistent return day pattern, with most returned on Mondays or Tuesdays. This also makes sense, as this was the period where my parents were both empty nesters but still working: during this time, they would have been more likely to watch movies on the weekend than during the week, thus mailing them back on Mondays and Netflix receiving them on Tuesdays.
Then 2016 and 2017 are less consistent again - my guess is that this is right around when my mom retired. When I was talking to my parents about this analysis, my mom mentioned that when she first retired she watched a lot of TV series on Netflix DVDs. I also know that it took my parents a while to start paying for all the streaming services, so it would make sense that in these first few years after retirement you see a lot less consistency in the return days, as my mom was likely burning through TV shows via Netflix's DVD service!
2018 and onwards gets decently consistent again. My hypothesis here is that 2018 is around when my parents started paying for and using streaming services, so they stopped watching as many movies and TV shows on Netflix DVDs. That would leave DVDs only for the more obscure foreign films or recently released movies not yet available on streaming that they wanted to watch, and everything else would have been watched via streaming. In this scenario, it makes sense that the behavior would revert to a consistent early-week return date: my dad was still working, and so I assume that they watched the movies that they ordered from Netflix on the weekends, and my mom watched other things during the week via streaming services.
Finally, 2021 and 2022 are slightly less consistent and much more sparse than any of the other years. My dad retired in December 2021, which is when my parents started taking a lot of trips in their RV. But I don't think that's what explains the sparseness - my guess is that they switched their plan from 3 DVDs to 1 sometime in 2021, which led to the slow death of their usage of the service.
Bring in the parents: putting my hypotheses to the test
I texted my parents to see if I could confirm some of these hypotheses. First off, my mom retired in December 2015 - huzzah, I was right! Pretty cool that you can see her retirement just in the distribution of return days of the week.
Then, my dad told me that they had access to Netflix’s streaming as soon as it started in 2007, but my mom doesn’t think they started using it regularly until around 2015. My parents also got Apple TV Box in November 2020, which made streaming very easy across the various services. So that doesn’t check out with my “2018 is when they started streaming regularly” hypothesis - something else must have happened in 2018 that got them back to a more consistent DVD viewing pattern. Maybe my mom ran out of TV shows that Netflix had on DVD?
Finally, they switched to the plan with only 1 DVD in August 2022 - way after the 2021 sparseness started! So it must have gone the other way: their utilization was going down, and so they downgraded their plan.
Movie quantity over time
Next up, I want to look at a more high-level summary of the amount of movies we watched. My guess is that we watched way more while I was still living at home before 2009, and then that it spiked again after my mom retired in December 2015. I might also guess that my parents watched more movies in 2020 and 2021 during Covid, but I'm not sure if that would be reflected in the number of DVDs since that's also when they were using streaming services.
Welp, nope - doesn't look like there's any discernible pattern in terms of the number of movies we watched over the years. It's very interesting to me that you don't see any obvious decreases when I moved out or even when my parents bumped their plan down to one DVD per month (but maybe that's because there isn't enough data to see that).
Utilization rate
I wonder how this compares to the maximum possible number of movies per month. Let's do some back-of-the-envelope math!
Assuming:
we have a plan that lets us have 3 DVDs at a time
we can only watch one movie per day
it takes Netflix one day to process a returned movie and ship out the next one
they send it with 2 day shipping to get to us
and when we mail it back, it goes with overnight return shipping
That means that each movie takes up a total of 5 days (1 day to be processed by Netflix + 2 days in the mail to get to us + 1 day to be watched + 1 day to return to Netflix). So each of the 3 DVDs can go through 6 full rental cycles per month, meaning that the max number of movies we could watch in a month is 18.
On average, my family watched 7.3 movies per month - a little less than half of the possible rentals. But there were some months when we went through 14 movies, a 75% utilization rate! For a working family who definitely does not watch movies every day, not bad.
With that, thanks for joining me on this journey down Netflix memory lane! RIP Netflix DVD service, you were a true trailblazer ahead of your times. From those of us who were your loyal fans for almost two decades, so long and thanks for all the movies.