Most viewed people articles on Wikipedia by year of death

For fun, and to practice my rusty Pandas skills, I calculated and plotted the most viewed people articles on Wikipedia by year of death. In other words, the most "popular" death in each of the past 122 years. The "most viewed" part is the number of views each page received throughout 2020 (it's therefore biased towards more recent deaths, still occasionally mentioned in news and on TV).

Most viewed people articles on Wikipedia by year of death

Most viewed people articles on Wikipedia by year of death. (This is cropped to avoid the image taking up a huge amount of vertical screen real estate.)

The data were downloaded using the Wikipedia API. The "Deaths by year" category provides the links to the (~500k) people who have died since 1900, and another API endpoint (unfortunately not provided as part of the same API service) provides the pageviews as a function of time. I chose to grab pageviews in 2020; total views (i.e. since Wikipedia started recording them) weren't available.

I manually grouped each of the 122 dead people into categories. These were entirely subjective and many would have fallen into multiple or other, unlisted categories. I tried not to miscategorise anyone; but, while this wasn't the main point of the exercise, it did show that English Wikipedia viewers love to look at the pages of dead politicians and royalty.

Another caveat: Wikipedia's pageview API seems to return the pageviews of redirected articles as those of the target article. This is logical behaviour for misspellings of peoples' names, but it also miscategorises redirects for e.g. family members of famous people. In these cases, the page for the (dead) relative redirects to the (sometimes alive) famous person, and the dead relative gets assigned all of the famous person's pageviews. I ended up with Dwight D. Eisenhower's, Richard Nixon's and Vladimir Putin's fathers, Joe Exotic's wife, and Kobe Bryant's daughter as the "most viewed" people in their respective years of death. Since there were only a few, I manually removed these cases.

While downloading the data, I also noticed that in the time between my downloading of the "Deaths by year" category page and of each article's pageviews some pages had been deleted. I assumed these pages were deleted because the subject was not considered relevant to Wikipedia, so I set such peoples' pageviews to 0. I also did this for any articles that returned an HTTP 404 code on the pageviews API, which the documentation stated could indicate that no one viewed the page in the period in question.

So, there we are. That was a relatively fun experiment, and I feel like I've learned a bit more about Pandas. For anyone interested, the code and data are on GitHub.