Place Names in the Semantic Space
MAT 259, 2022
Zilong Liu

Concept
Place names include rich geospatial semantics about places, and they can be extracted from texts such as titles and keywords. In this project, I am interested in how place names co-occurred in geography-related items in Seattle Public Library. Rather than using a co-occurrence matrix, I created a 3D co-occurrence graph for the purpose of visualization and interaction.

Query
The query below retrieves all the relevant items (along with their titles).
                    
SELECT 
    spl_2016.deweyClass.bibNumber,
    spl_2016.deweyClass.deweyClass,
    spl_2016.title.title,
    GROUP_CONCAT(spl_2016.subject.subject) AS 'subject',
    spl_2016.itemType.itemType
FROM
    spl_2016.deweyClass
        INNER JOIN
    spl_2016.title ON spl_2016.deweyClass.bibNumber = spl_2016.title.bibNumber
        INNER JOIN
    spl_2016.subject ON spl_2016.deweyClass.bibNumber = spl_2016.subject.bibNumber
        INNER JOIN
    spl_2016.itemToBib ON spl_2016.itemToBib.bibNumber = spl_2016.subject.bibNumber
        INNER JOIN
    spl_2016.itemType ON spl_2016.itemToBib.itemNumber = spl_2016.itemType.itemNumber
WHERE
    (spl_2016.deweyClass.deweyClass >= 900)
        && (spl_2016.deweyClass.deweyClass < 1000)
GROUP BY spl_2016.deweyClass.bibNumber , spl_2016.deweyClass.deweyClass , spl_2016.title.title , spl_2016.itemType.itemType
                    
                

Toponym Recognition
I used spaCy (an open source Python package for natural language processing) to carry out toponym recognition on their subjects.

                    
import spacy
import pandas as pd
import numpy as np

## the function to extract toponyms
def extract_toponym(text):
    toponym_list = []
    nlp_result = nlp(str(text))
    for result in nlp_result.ents:
        if (result.label_ in [u"GPE", u"FACILITY", u"LOC", u"FAC", u"LOCATION"]):
            toponym_list.append(result.text)
    return '\t'.join(np.unique(np.array(toponym_list)))
            
## load the pre-trained English pipeline
nlp = spacy.load("en_core_web_lg")

## read query result
df_spl = pd.read_csv('3d_query_result_2016.csv')
## df_spl = df_spl.head(5000)

## extract toponyms from both subject
df_spl['toponyms_subject'] = df_spl['subject'].apply(lambda text:extract_toponym(text))

## output dataframe to csv
df_spl.to_csv('3d_query_result_2016_ner.csv')
                    
                

Visualization and Interaction
Place names are randomly located in the semantic space. When you hover over a toponym, co-occurred toponyms will also be highlighted along with the items they are mentioned. The number of these items and all extracted locations in these items are also displayed.




Code
All work is developed within Processing
Zilong_Liu_placeGraphMain.zip