2D Matrix: Harry Potter Books vs Movie Checkouts
MAT 259, 2015
Nataly Moreno
Concept
The goal was to make a comparison over the number of checkouts for all the Harry Potter books and movies and see which ones were checked out more frequently at the Seattle Public Library. Did people prefer books over movies? Were the books and/or movies more popular during a certain time period?
Query
SELECT
YEAR(cout),
MONTH(cout),
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%sorcerers stone%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book1,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%sorcerers stone%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie1,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%chamber of secrets%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book2,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%chamber of secrets%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie2,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%prisoner of azkaban%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book3,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%prisoner of azkaban%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie3,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%goblet of fire%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book4,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%goblet of fire%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie4,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%order of the phoenix%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book5,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%order of the phoenix%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie5,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%half%' AND title LIKE '%prince%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book6,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%half%' AND title LIKE '%prince%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie6,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN 1
ELSE 0
END) AS Book7,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title LIKE '%part 1%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie7,
SUM(CASE
WHEN (title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title LIKE '%part 2%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN 1
ELSE 0
END) AS Movie8
FROM spl2.inraw
WHERE
DATE(cout) >= '2005-1-1'
GROUP BY YEAR(cout) , MONTH(cout)
Preliminary sketches
In order to see the results clearly happening over time, the number of checkouts were put onto a 2D grid using pastel colors with transparencies. The y-axis displayed a book or movie's data in chronological order from top to bottom. The x-axis showed the data by year, every cell within a year represents a month. Initially, all the data that could be found was displayed onto the grid. The colors are in RGBA format; the red is affected by the numerical month value, the green by whether the data is from a book or movie, the blue by the year, and the alpha by the number of checkouts, more checkouts means a more solid color, while less means the color is more clear.
Process
The font was changed to a more professionally used font and new color schemes were explored. Additionally, from the initial results, it could be seen that there was no data for 2014 and for any year before 2005. The data for year 1970 was inaccurate and a mistake in the database.
Final result
Code