The Slow Readers of Seattle Public Library (SPL)
MAT 259, 2020
Evgeny Noi
Concept
My main goal in visualizing the Seattle Public Library (SPL) data was to bring in the locational
information into investigation. In my previous project I identified and joined locational attributes of 11,000 books
to include the information on the branch where the books are stored. Another crucial element of proposed visualization
was in generating representation at the finest level of detail. That is, the base unit of analysis should have been the
most basic one: individual check-ins or check-outs. Overall, around 130,000 identifiable check-ins/outs within the course of
3 years (2012-12-31 – 2015-12-31) were found in the database for 11,000 books, followed by a simple exploratory analysis.
Here my first design decision was to switch from p5.js onto standalone Processing due to the volume of the data.
Query
select *
from spl_2016.inraw
where (cout>'2012-12-31' and cout<'2015-12-31') and
CONCAT(bibNumber,collcode,itemtype) in ('261cs9rarbk',
'813canfacbk',
-- insert unique identifiers from the attached sql file (11,000 lines of code).
);
Preliminary sketches
I started out working with a follow-along examples from the class. From left to right I mapped a spot
for each individual book. Then, on the vertical axis the time span of the loan from the first day
of the observation period (2012-12-31). The z-axis depicts the time of the day the books was either
checked-out or returned.
Process
Because the granular analysis was based on individual check-in trajectories in time, I decided to project
each individual book on the horizontal (X) axis. In fact, new variable ‘ranker’ was generated as a re-ordered
version of the barcodes (see code below), where the books were sorted according to the branch, in which they
were stored. This allowed visible differentiation of patterns in data, with visible groupings.
# generate unique identifier as reordered version of barcodes
dur5['ranker'] = dur5.barcode.rank(method='dense')
print(dur5.ranker.max())
To identify the duration of the book loan, I calculated the difference between the check-in and check-out dates.
The loan on a book is a continuous process, and as such, it is traditionally represented as a line.
To locate the line on the axis in relation to other books (vertical (Y) axis), new scale of days
from min(check-out) to max(check-in) was generated. While the mean duration of a book is 16 days, there are
48,000 transactions with an indicated loan period above mean. After generating several visualizations and in
order to minimize cluttering on the canvas, the filtering duration threshold was set to 75.
Finally, I decided to add time of day as a 3rd dimension of the visualization, with a minimum at the opening
hour of the library and maximum at the closing time. In doing so, I decided to add the curvature and convert
simple straight-line representations into curveVerteces (as per Processing terminology), where the degree of
curvature depends on the middle point which was warped on the y-axis (see code in project folder for further
details). In the last step, I added points at the start and end of a book loan to better differentiate the
duration.
I do not have 'work-in-progress' documentation for this visualization. Suffice to say there were
many errors and debugging.
Final result
Code