Data Science on GitHub
MAT 259, 2015
Anastasiya Lazareva

Concept
I decided to use the GitHub API for this project to collect data on repositories that are related to data visualization and machine learning.

Query
I used the following API https://developer.github.com/v3/ along with geo coordinate data from http://www.datasciencetoolkit.org/ API. I first collected the data and saved it in JSON files since there was a rate limit for the API. Code used to collect data


Preliminary sketches
There were no premiminary sketches for this project.

Process
The following data was collected from the GitHub API Then I collected statistics for the repository using the repository and owner name. The following stats are collected: I thought this data could be used for an interesting data visualization. I'm personally interested on who the top contributors are, what languages they use and how much activity is happening in their repositories.

Final result
The final visualization has the following features:




Code
I used Processing.

Control: All user interaction is contained within the UI. The user can enable/disable keywords at the top and click on the repository graph to get more detailed information about individual repositories.

Source Code + Data