External Correlation
MAT 259, 2012
Ankit Srivastava

Introduction
The objective behind this project was to correlate two different sets of data and find simmilarities pattern in both. I choose to correlate the checkout patterns of people watching "Movie Series'" like Pirates of the carribean, Harry Potter with data that contains performance of new parts of these movies in the year 2011.

In the visualization, I used a Treemap to represent top 10 grossing movies of 2011 and a traditional bar graph to represent the checkouts of those movie series in SPL for the year 2011.

Background
and Sketches
Movies Data Set

This data set contains complete information about top 100 movies of 2011 ranging from information like budget, total gross amount, rotten tomatoes rating, etc. I have used the Treemap to visually represent top 10 grossing Movie Series Parts in 2011. The color saturation and the area of the rectangle together indicate the ranking. The darker the saturation, and the more the area of the rectangle, the more it is in terms of overall gross amount overall.

Query
select month(cout), count(*) from spl0.inraw where year(cout) = 2011 and title like '%Harry Potter% group by month(cout);

Result and
Analysis
I tried to analyse the patterns in checkouts for CDs and DVDs for Movie Series matching the top 10 grossing Movie Series according to the dataset. It was interesting to find various kind of patterns like - in case of Pirates of the Carribean - On stranger tides, the checkout reached its peak in the month just after its release and gradually decreased. The checkouts for "X-Men: First Class" kept increasing throughout the year showing and decreased one month after the release of the new part. This shows the popularity of the previous parts in the series. Interesting patterns were seen in "Rise of Planet of the Apes" which suggested people checking out movies of this series before the release of the new part showing interest in catching up with the previous parts since the last part was released in 2001, more than 10 years ago, In constrast with others where previous parts were relatively recent. Not much can be said about "Twilight - Breaking Dawn" since the checkouts kept increasing towards the first half of the year but decreased towards the release of the movie. It would be interesting to map these findings on a larger timeline, that could catch checkout patterns for more parts of these series and see how these movies fare against each other and correlate with their total gross rankings.


Code
I used Processing.

Run in Browser

SQL Query Code

Source Code