Data Correlation
MAT 259, 2014
Mohit Hingorani

Concept
Correlating data to me means making sense of two data sets at the same time, and understanding the dynamics between the two sets. For this assignment I am exploring the trends in the emerging fields of “Data & Big Data”. I will be using Article Search from the New York Times API and looking for the words ‘Data’ and a subset of it ‘Big Data’. In the Seattle Public Library, I will be searching for books with the titles containing the words ‘Data’ & ‘Big Data’. Very interesting trends and patterns emerge.

I will be looking into data from (2011-2013): a 3-year period.

Labels:
I am going for a Swiss poster design style: featuring bold fonts and a minimalistic design. The data is arranged vertically (like a time line) instead of the usual horizontal flow. I will be using a combination of simple bars & lines for the project. The mouse position will highlight the number of books/ articles that the bar represents for both SPL & NYT.

Color Scheme:
Red & White

Background
and Sketches

Query
SQL queries:
select month(cout),year(cout),sum(case when itemtype = "acbk" or "jcbk" then 1 else 0 end) as book from inraw where title like "% data %" and date(cout) >= "2011-01-01" and date(cout) <= "2014-03- 03" group by month(cout),year(cout) order by year(cout), month(cout)
select month(cout),year(cout),sum(case when itemtype = "acbk" or "jcbk" then 1 else 0 end) as book from inraw where title like "%big data %" and date(cout) >= "2011-01-01" and date(cout) <= "2014- 03-03" group by month(cout),year(cout) order by year(cout), month(cout)

New York Times queries:
Example request
String request = ""; request += http://api.nytimes.com/svc/search/v2/articlesearch + ".json"; request += "?q=" + big+data; request += "&facet_field=source"; request += "&facet_filter=true"; request += "&begin_date=" + 20140101; request += "&end_date=" + 20140131; request += "&api-key=" + 362412324bfd4d15c3937caa04125f28:9:68817159;
Results
The results were pretty striking. One can see an exponential increase in the NYT article search for ‘Data’ in August of 2012. The results on the ‘Big Data’ double during the same month.

A corresponding increase is noted for the Seattle Public Library as well. There is a noticeable increase for ‘Data’ related books in August 2013: an 8-month time difference. What is interesting is that the first checkout for the first Big Data related book happens in September 2013: One full year after the field of data (and by extension big data) exploded in the computer science industry. It brings in interesting questions for the publishing industry highlighting the time taken to write a technical book on a new emerging field and the trends and readership of books and articles on emerging technology fields.

Final result


Code
I used Processing.

Run in Browser

Source Code