Giving Adresses to Books
MAT 259, 2020
Evgeny Noi
Concept
For my project with Seattle Public Library (SPL) I decided to visualize a journey of a
book in space and time. Unfortunately, the provided dataset does not have any information on the
readers or their places of residence, so I decided to search for the wise to acquire such information.
SPL publishes its data on City of Seattle Open Data Portal, where library inventory may be found and
downloaded. The inventory file contains information on the resources owned by the library, including
the bibNumber from the Library of Congress, as well as the ItemType and ItemCollection that could be used to
geneate unique identifier to be joined to the spl_2016 database. Additionally, I found an ealier copy of the
inventory on the Portal (2017) on kaggle.com. Logically, it was interesting to compare the prescription of
books to branches from these two time frames: 2017 and 2020. Because inventory files do not have a barcode or
other unique identifiers for a join, I created a unique identifier for both inventory and check-ins/outs
by concatenating ‘bibNumber’, ‘collcode’ / ‘ItemCollection’, and ‘ItemType’ (see Python code extract).
Overall, there are 10,900 items that could be merged based on the aforementioned criteria.
I copied the identifiers into a generated SQL query and ran it on the database to build the following query
(see SQL code extract).
Query
import pandas as pd
# df2 is inventory 2017, df is inventory 2020
df2['tycobib'] = df2.BibNum.astype('str') + df2.ItemCollection + df2.ItemType
df['tycobib'] = df.BibNum.astype('str') + df.ItemCollection + df.ItemType
# drop any records that have duplicates given unique identifier tycobib
dfu = df.drop_duplicates('tycobib', keep=False)
dfu2 = df2.drop_duplicates('tycobib', keep=False)
merger2 = pd.merge(left = dfu, right = dfu2[['tycobib', 'ItemLocation', 'FloatingItem']],
on='tycobib', how='inner')
SELECT
*
FROM spl_2016.inraw
WHERE cout>'2016-12-31' AND
CONCAT(bibNumber,collcode,itemtype) in ('261cs9rarbk',
-- < ...> here 10,900 identifiers were inserted
'3304438canfacbk');
Preliminary sketches
Process
Final result
The resulting number of items (books) by locations is provided in the table and map below.
Code