Final Project
MAT 259, 2018
Echo Theohar
Concept
I chose to work with George Legrady's dataset that he sourced from the Centre Pompidou, an arts institution based in Paris France. The data seemed to be a small snippet of a much larger set, and included query histories stemming from popular French news sources such as Le Monde and Le Figaro, to search queries from dailymotion.com, to general search queries from bing. This data was captured in multiple ways, stemming from date, time, year, url, and string of the query. The data I chose to focus on were general search queries from bing.com.
Query
Here is an example of the code used to parse the CSV files: (all tested in Processing 3)
for (int i = 0; i < rows2; i++) {
element = all_data2.getString(i, 4);
if (element.equals("google")) {
Google_count += 1;
}
if (element.equals("iranianuk")) {
IranianUK_count += 1;
}
if (element.equals("iran")) {
Iran_count += 1;
}
if (element.equals("facebook")) {
Facebook_count += 1;
}
if (element.equals("instagram")) {
Instagram_count += 1;
}
if (element.equals("bbc")) {
BBC_count += 1;
}
if (element.equals("gmail")) {
Gmail_count += 1;
}
if (element.equals("hotmail")) {
Hotmail_count += 1;
}
if (element.equals("yahoo")) {
Yahoo_count += 1;
}
if (element.equals("youtube")) {
Youtube_count += 1;
}
if (element.equals("senego")) {
Senego_count += 1;
}
if (element.equals("seneweb")) {
Seneweb_count += 1;
}
if (element.equals("le monde")) {
LeMonde_count += 1;
}
if (element.equals("le figaro")) {
LeFigaro_count += 1;
}
if (element.equals("le parisien")) {
LeParisien_count += 1;
}
if (element.equals("airbnb")) {
AirBNB_count += 1;
}
}
println("google:"+ Google_count);
println("iranianuk:"+ IranianUK_count);
println("iran:"+ Iran_count);
println("facebook:"+ Facebook_count);
println("instagram:"+ Instagram_count);
println("bbc:"+ BBC_count);
println("gmail:"+ Gmail_count);
println("hotmail:"+ Hotmail_count);
println("yahoo:"+ Yahoo_count);
println("youtube:"+ Youtube_count);
println("senego:"+ Senego_count);
println("seneweb:"+ Seneweb_count);
println("le monde:"+ LeMonde_count);
println("le figaro:"+ LeFigaro_count);
println("le parisien:"+ LeParisien_count);
println("airbnb:"+ AirBNB_count);
}
Preliminary sketches
I was curious about how the data was collected by the institution and wanted to visualize the data in such a way that the user was implicated in the viewing of the data. I decided to use the OpenCV library to use the internal camera built into my laptop to capture live video and use it as an input for moving between the first two columns of the data, being the session ID and date of access.
Process
A large portion of the project involved building a text parser that could act similarly to MySQL in the sense that I could isolate specific terms that I deemed as frequently occurring in the Bing searches. The parser worked by looking at a string of text that could be cross referenced with the information I had in three different CSV documents, which were split up by month.
Final result
Overall, I had some trouble shifting between the differing months and sheets of data due to the framerate being a bit too fast. The problem could not be solved as easily as slowing down framerate, but hinged more upon conditional statements that need to be ironed out. In the future, I would like to build upon this project and clean up the code since I "brute forced" a lot of functionality that could probably be accomplished in a more elegant way.
Code