Data Correlation
MAT 259, 2014
Li Zheng

Introduction
The main idea is that I want to correlate the data of checked out books from Seattle Public Library and the data of articles containing key words “China” and “Japan” in New York Times from the year 2005 to 2011. By executing SQL query, we can get data from Seattle Public Library and store it in .txt file. We can use the Article API which is provided by New York Times to get the data, store the data in JSONObject, and then parse it. In the project, we can click mouse or press the keyboard to switch between the result from Seattle Public Library and that from New York Times.

Background and Sketches

Query
Seattle Public Query:
select year(cout), month(cout), count(*) from inraw where cout >= '2005-01-01' and year(cout) < '2012-01-01'and title like '%China%' group by year (cout), month(cout) order by year (cout), month(cout); select year(cout), month(cout), count(*) from inraw where cout >= '2005-01-01' and year(cout) < '2012-01-01'and title like '%Japan%' group by year (cout), month(cout) order by year (cout), month(cout);

New York Times Request:

String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey; String result = join( loadStrings( request ), ""); JSONObject nytData = new JSONObject(join(loadStrings(request), ""));

Results
The result contains four colors, each color representing specific record, either articles containing ‘China’ or ‘Japan’, or checked out quantity whose title contains ‘China’ or ‘Japan’. You can use mouse to switch between different views. We can see the checked out quantities whose title contains ‘China’ or ‘Japan’ are basically the same from the year 2005 to 2011. As for the articles containing ‘China’ or ‘Japan’ in New York Times, we can figure out that there are always more articles about China than Japan. We can also analyze the two outliers a bit. For China, the outlier appeared in August in the year 2008, when the Beijing Olympic Games happened in that month. For Japan, the outlier appeared in March in the year 2011, when the Fukushima nuclear disaster happened because of the earthquake.








Code
I used processing and NY Times API.

Source Code