Movies rating since 2020 from IMDb
MAT 259, 2024
Jing Peng
Concept
“IMDb (an acronym for Internet Movie Database) is an online database of information related to films,
television series, podcasts, home videos, video games, and streaming content online – including cast,
production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.
IMDb began as a fan-operated movie database on the Usenet group "rec.arts.movies" in 1990, and moved to the
Web in 1993. As of March 2022, the database contained some 10.1 million titles (including television
episodes), 11.5 million person records, and 83 million registered users.”
User ratings of films
As one adjunct to data, the IMDb offers a rating scale that allows users to rate titles on a scale of one to
ten.
With the data from IMDb Non-Commercial Datasets, I’m trying to figure out if there is a connection between
rating score and popularity(here I use the vote number to present) in movies in recent years, which can be
seen as a projection of the trend in movies in modern society, to some extent.
Preliminary sketches
The sketch is attached below. About the visualization part, the three axes would be year, numVotes, and
ratings. The numVotes decides the point size and transparency, and the rating decides if the title is
horizontal or vertical.
Process
I used IMDb Non-Commercial Datasets to select movies since 2020 and combined them with ratings using
Dask (Dask is a Python library for parallel and distributed computing).
I decided to use film data whose numVotes are bigger than 5000 to:
1. Decrease the number of data and make the visualization less cluttered.
2. More focus on trending movies, since people have their own subjective opinions about movies, and with too
few votes, the results can easily become extreme.
Final result
For the visualization part, the three axes are year, numVotes, and ratings. X is ratings, Y is numVotes, and
Z is the year. The numVotes decides the point size and transparency, and the rating decides if the title is
horizontal or vertical. As we can see, most of the popular movies have pretty good ratings.
There could be several reasons:
1. Well-produced: Popular movies usually have large marketing budgets for top-notch acting, directing,
cinematography, and special effects, which are usually recognized and appreciated by audiences.
2. Audience Expectations: Expectations for popular movies are usually higher, so people may be more
forgiving of minor flaws or shortcomings in the movie and more focused on the overall enjoyment and impact
of the movie.
3. Individual Subjective Factors: Ratings of popular movies may be affected by selection bias, whereby
people who are more inclined to appreciate a particular type of movie (e.g., action, comedy, drama) are more
likely to watch and rate the movie, resulting in a higher average rating.
Here, I give a standard boundary rating: of 7.0. The title of the movie whose rating is more than 7.0 is
horizontal.
For users who want to check the information for a specific movie, they can enter the name of the movie in
the input field. A relative list of the movie will be shown in the Scrollable List.
Colors are used to distinguish the movies from genres.
You can choose any of these movies to see the detailed information including its year, rating, number of
votes, and genre.
Code