Linear Frequency
MAT 259, 2012
Ankit Srivastava

The idea behind this visualization is to leverage the fact that data can visualize itself beautifully. I feel that letting data visualize can lead us to patterns that exist in the Seattle public library database. Patterns like long term checkouts, multiple checkouts and varied check-in patterns, and much more.

The bivariate graph consists of Time of day one Y-Axis and Day of the Year on X-Axis. The X-Axis data was collected for a period of 3 years - 2009-2011. I wanted to see if there exists patterns when we consider the Least popular dewey subcategory for each dewey category. For each library item that matches the query criteria, a line with x1,y1 corresponding to the checkout time and x2,y2 corresponding to the checkin time is drawn. The lightness of the line is decided corresponding to the duration of the checkout. The longer the checkout, the darker the line is.

1. General History and other related areas has the maximum amount of long term activity whereas General Statistics has the least.
2. In the category, Italian and Latin Literature, we can see that a lot of transactions were checked out at the same time but have very different checkin durations.
3. Also, the overall concentration of activity between 10am to 8pm correspond to the library working hours. Although we can see some anomalies to this that can either correspond to online checkouts or renewal.

and Sketches
The idea behind this visualization is to find out patterns in check-out and check-in for items belonging to various dewey categories. Each dewey category has a color coding to it. Each point on the bivariate graph corresponds to time of the day and day of the year.

For each library item, a line is drawn from (x1,y1) to (x2,y2) such that (x1,y1) is the datetime the item was checked out and (x2,y2) is the datetime the item was check back in.

This kind of visualization can be used to figure out various kinds of information like:
1. Which dewey category has a lot of short term activity (checkout duration or the line on the graph is short)?
2. The check in behavior corresponding to various dewey categories. etc.

To make it look less cluttered (all transactions in a year could lead to a very cluttered graph), I plan to filter out specific titles such as the keyword used is neutral to all dewey categories. I will start this visualization with 2 year data (2010,2011) and then try to extend it to the whole dataset.

Basic interactivity that could be added to this graph could be:
1. Mouse pointer can determine the datetime at that point
2. Keys - 1,2,3..0 can be used to control each dewey category (to hide and display data)

select FLOOR(deweyClass/10)*10 subgroup, count(*) from spl0.inraw where deweyClass <> "null" OR deweyClass <> NULL group by 1 order by 2 asc;

select FLOOR(deweyClass/10)*10 subgroup, title, cout, cin from spl0.inraw where (deweyClass <> 'null' OR deweyClass <> NULL) and itemtype like "%bk" and (TIMESTAMPDIFF(HOUR, cout, cin)) > 0 and year(cout) >= 2009 and year(cin) <= 2011 and FLOOR(deweyClass/10)*10 in ('0','150','290','300','420','590','640','780','810','910');
Fetch Smallest Sub group in each dewey category

Result and

I used Processing.

Run in Browser

Source Code

The visualization has the following interactions
1. Filter each dewey category with numbers 0-9.
2. Toggle between legend and graph using L.
3. Filtering out threshold data (duration less than 120 days) using T.
4. Clearing the graph with ~.
5. Toggle grid using G.