Outliers
MAT 259, 2015
Daniel Imberman

Concept
For this project I wanted to be able to further research my look into finding outliers through comparison of normalized averages and standard deviations

Query
SELECT hour(day1.cdate),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 1 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 2 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 3 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 4 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 5 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 6 then 1 else 0 end),
sum(case when deweyClass>=810 and deweyClass<820 and dayofweek(day1.cdate) = 7 then 1 else 0 end)
from (SELECT cout as cdate, deweyClass FROM ( SELECT cout, deweyClass FROM spl2.inraw WHERE DATE(cout) >= '2013-03-01' AND DATE(cout) < '2013-04-01' ) as outday1 where hour(outday1.cout) >=10 and hour(outday1.cout)<= 20 ) as day1 group by hour(day1.cdate) order by hour(day1.cdate)

Preliminary sketches
I found that by using a color pallete where the insignificant data was the same color as the background, it became much easier to find the outliers. I also found that having the seperate modes helped significantly in understanding the data.

chronological

sorted by average

sorted by standard deviation



Process

I was able to find a better color scheme that would look more similar to a heat mapping however I was having an issue in that it was hard to tell the relevance of any of the color. What's dense? what's sparse? everything seemed equally important and this simply was not the case.

Final result
sorted by average

sorted by standard deviation


Code
I used Processing.

Control: A = sort by average
C = chronological
S = sort by standard deviation
Source Code + Data