Giving Adresses to Books
MAT 259, 2020
Evgeny Noi

Concept
For my project with Seattle Public Library (SPL) I decided to visualize a journey of a book in space and time. Unfortunately, the provided dataset does not have any information on the readers or their places of residence, so I decided to search for the wise to acquire such information.
SPL publishes its data on City of Seattle Open Data Portal, where library inventory may be found and downloaded. The inventory file contains information on the resources owned by the library, including the bibNumber from the Library of Congress, as well as the ItemType and ItemCollection that could be used to geneate unique identifier to be joined to the spl_2016 database. Additionally, I found an ealier copy of the inventory on the Portal (2017) on kaggle.com. Logically, it was interesting to compare the prescription of books to branches from these two time frames: 2017 and 2020. Because inventory files do not have a barcode or other unique identifiers for a join, I created a unique identifier for both inventory and check-ins/outs by concatenating ‘bibNumber’, ‘collcode’ / ‘ItemCollection’, and ‘ItemType’ (see Python code extract).

Overall, there are 10,900 items that could be merged based on the aforementioned criteria. I copied the identifiers into a generated SQL query and ran it on the database to build the following query (see SQL code extract).

Query

                import pandas as pd
                # df2 is inventory 2017, df is inventory 2020
                df2['tycobib'] = df2.BibNum.astype('str') + df2.ItemCollection + df2.ItemType
                df['tycobib'] = df.BibNum.astype('str') + df.ItemCollection + df.ItemType
                # drop any records that have duplicates given unique identifier tycobib
                dfu = df.drop_duplicates('tycobib', keep=False)
                dfu2 = df2.drop_duplicates('tycobib', keep=False)
                merger2 = pd.merge(left = dfu, right = dfu2[['tycobib', 'ItemLocation', 'FloatingItem']],
                on='tycobib', how='inner')                
            

                SELECT
                *
                FROM spl_2016.inraw
                WHERE cout>'2016-12-31' AND
                CONCAT(bibNumber,collcode,itemtype) in ('261cs9rarbk',
                -- < ...> here 10,900 identifiers were inserted
                    '3304438canfacbk');
            

Preliminary sketches

Process

Final result
The resulting number of items (books) by locations is provided in the table and map below.






Code
More detailed report is available at the link
Report and final data file