Detecting Outliers in Checkout Periods at the SPL:
Twilight by Stephanie Myers
MAT 259, 2023
Brianna Griffin
Concept
I will be looking at irregularities in the data set through outliers. Specifically, this project will analyze the checkout time in days for the first
book in the Twilight series by Stephenie Meyer, namely "Twilight". Published in 2005, the book is based in Forks, Washington around 100 miles
away from Seattle. Thus, it has a lot of prevalence within the Seattle Public Library with many years of checkout and checkin data.
Specifically, I would like to answer the following questions:
- Are there any observations that do not fit the rest of the data set?
- When and why do these observations occur?
- How do the outliers affect statistics of the sample?
Queries
Below are some of the queries I used to identify and explore the data.
1. The first chunk of code identifies how many outliers there are in my data sample.
An observation is considered an outlier if it is located outside 3 standard deviations
of the mean. For the below SQL code `time_CO` measures the difference between checkout and
return of the book. The code, thus, identifies, data points as outliers if they have a `time_CO`
that is greater than 96 days.