Visualizing the Items Acquired by Seattle Public Library over Time
MAT 259, 2019
I'm interested in one particular metadata - itemNumber. It is a unique and incrementing id that gets assigned and incremented each time a new item gets acquired. I want to plot the itemNumber as well as the deweyClass of each item over time to see if there are any interesting patterns.
This query returns itemNumber, bibNumber, itemType, title and deweyClass of all the items sorted by itemNumber. It takes about 29 seconds and returns 2,391,281 results.
It's worth noting that there are duplicates with the same item number, and only differs slightly in their titles. I removed them later when loading into processing. After removing the duplicates, there remain 2,326,621 records.
Preliminary sketches and process
I've made a few updates to make the visualization clearer and the program more user-friendly. Changes include:
- hover text shows dewey class and title together
- hover text now adjusts according to position and won't be out of canvas
- change background and legend colors
- make the circle smaller
- add comment section and more comments
Below are some screenshots.
From the graph, we can clearly see that there are distinct vertical and horizontal lines that are missing the dots. Horizontal lines mean that some particular Dewey classes are never bought, in this case, 28 - 50 in Computer Science, 134 - 154 in Philosophy, 302 - 307 in Social Sciences, 917 - 929 are some of the examples. Vertical lines mean that there are some item numbers that are missing. However, it's really hard to find the exact purchase date of the item, since I wasn't even sure if the item number is evenly distributed. Therefore, I added the hover text function, so that when the mouse is over some dot, it will show the title of that item. I think this can give more information about the items and we can use the title to get the publish date of the item.
I also really care about the performance of the program. After finishing the project, I spent some time to try to optimize my program further. Initially, I put my drawing function inside draw(), which runs in a loop. Because there are about 2.3 million circles that need to be drawn on the canvas, it takes about 30 seconds to redraw each time, and it makes keyboard and mouse interaction practically impossible since all the time are spent on drawing. After some research, I came up with using PGraphics. It allows me to draw to a buffer and save it, and later I can just use the content from the buffer without needing to redraw the frame. Therefore I used 4 PGraphics, to save the full, the book, the media, and the misc graph individually when the application starts up. I also put each of the drawing function in separate threads so that the operation can be done in parallel. This really improved the performance of my application. Currently, it takes about 30 seconds on startup to draw the four graphs, and then it takes no time to redraw and switch between them. Of course, I can further divide the drawing area into smaller ones and use more threads to further accelerate if I feel the need to.
I thought about adding the checkout time of each item, for example, if I can get the first checkout time of each item, I can then use that as an indication of approximately when the item is purchased. I wrote two versions of the query to do that, as shown below. However, they are all too slow to run, and I couldn't finish running them in time. I'm hoping to find new ways to go about it.