"What does Reddit think of Fifty Shades of Grey?" An interactive visualization by subreddit
MAT 259, 2015
James Schaffer

Concept
In keeping with the books theme of this class, I decided to see what reddit was saying about Fifty Shades of Grey. Due to the recent movie release, there have been a handful of highly entertaining news stories and incidents, so I thought it might be interesting to visualize the sentiment of posts from different subreddits.

For those that don't know, reddit is a large collection of forums, each with their own topic, that constitute an anonymous social network. Users that visit the site are greeted on the front page with a list of 'currently hot' posts/topics. Reddit's prime feature is the ability to upvote or downvote other user's posts (this is like adding a 'dislike' button on Facebook), so content, credibility, and moderation are effectively crowdsourced. Unfortunately, this causes a lot of cat pics to rise to the front page.

Some references for those that are interested:
Reddit API
Some info on Reddit rankings

API Access
First, I accessed Reddit's API and collected every post that mentioned the book "Fifty Shades of Grey." Accessing the API is not as straightforward, since results have to be paged through. The following Java code accomplished this:

package redditjson;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.net.URL;
import java.nio.charset.Charset;

import org.json.JSONException;
import org.json.JSONObject;

public class RedditPager {

   public static void main( String args[] ) {
      int iteration = 0;
      String after = "";
      
      try {
         PrintWriter writer = new PrintWriter( "reddit.json" );
         String redditSearch = "harry+potter";
         
         while ( iteration < 100 ) {
            JSONObject nextRedditPage = null;
            try {
               String query = "http://www.reddit.com/search.json?q=" + redditSearch + "&limit=100&sort=relevance&t=all" + after;
               System.out.println( "Trying: " + query );
               nextRedditPage = readJsonFromUrl( query );
               Thread.sleep(5000);
               
               String afterString = nextRedditPage.getJSONObject( "data" ).getString( "after" );
               after = "&after=" + afterString;
               writer.println( nextRedditPage.toString() );
               writer.flush();
               iteration += 1;

               
            } catch (InterruptedException e) {
            } catch (IOException e) {
            } catch (JSONException e) {
               if ( nextRedditPage != null )
                  writer.println( nextRedditPage.toString() );
               writer.flush();
               System.out.println( "Done." );
               break;
               
            }
         }
         
      } catch (FileNotFoundException e1) {
         // TODO Auto-generated catch block
         e1.printStackTrace();
      }
   }
   
   public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException {
   InputStream is = new URL(url).openStream();
   try {
     BufferedReader rd = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-8")));
         String jsonText = readAll(rd);
         JSONObject json = new JSONObject(jsonText);
         return json;
       } finally {
         is.close();
       }
   }

   private static String readAll(Reader rd) throws IOException {
       StringBuilder sb = new StringBuilder();
       int cp;
       while ((cp = rd.read()) != -1) {
         sb.append((char) cp);
       }
       return sb.toString();
     }
   
}
	

Again, Reddit's results have to be 'paged through', which makes the JSON access slightly more than nontrivial (reddit will only return 100 at a time, and requires you to pass a parameter from the previous page to get the next page).

Next, I wanted to use a machine learner to classify the sentiment of each post. Unfortunately, offline learners cost some money, so I had to use an API with a rate limit, found here. The analyzer returns a classification for each body of text, along with three parameters that correspond to the 'quantity' of negative, positive, and neutral text. The following Java code automated the sentiment analysis for the 1000 or so posts:

package redditjson;

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.charset.Charset;

import org.json.*;

public class RedditJSONParser {
   
   public static void main( String args[] ) {

      try {
         BufferedReader f = new BufferedReader( new FileReader( "fifty.json" ) );
         String line = null;
         while ( (line = f.readLine()) != null ) {
            JSONObject listing = new JSONObject( line );
            JSONObject listingData = listing.getJSONObject( "data" );
            JSONArray topics = listingData.getJSONArray( "children" );
            for ( int j = 0; j < topics.length(); j++ ) {
               JSONObject nextTopic = topics.getJSONObject( j );
               JSONObject nextTopicData = nextTopic.getJSONObject( "data" );
               String nextID = nextTopicData.getString( "id" );
               int nextScore = nextTopicData.getInt( "score" );
               int nextUps = nextTopicData.getInt( "ups" );
               int nextDowns = nextTopicData.getInt( "downs" );
               String nextSubreddit = nextTopicData.getString( "subreddit" );
               String nextTitle = nextTopicData.getString( "title" );
               String nextSelfText = nextTopicData.getString( "selftext" );
               int numComments = nextTopicData.getInt( "num_comments" );
               
               if ( !nextSelfText.equals( "" ) ) {
                  
                  try {
                     JSONObject sentiment = new JSONObject( getSentimentString( nextSelfText ) );
                     String sentimentLabel = sentiment.getString( "label" );
                     JSONObject p = sentiment.getJSONObject( "probability" );
                     double neg = p.getDouble( "neg" );
                     double neutral = p.getDouble( "neutral" );
                     double pos = p.getDouble( "pos" );
                     System.out.println( 
                           nextID + "|||" + 
                           nextSubreddit + "|||" + 
                           nextSelfText.replaceAll( "\n", "   " ) + "|||" +
                           nextTitle + "|||" + 
                           nextScore + "|||" + 
                           nextUps + "|||" + 
                           nextDowns + "|||" + 
                           numComments + "|||" + sentimentLabel + "|||" + neg + "|||" +   neutral + "|||" + pos);
                  }
                  catch ( IOException e ) {
                     continue;
                  }
                  
               
                  Thread.sleep( 3000 );
                  
               }
               
               
            }
            
         }
         f.close();
         
      } catch (Exception e) {
         e.printStackTrace();
      }
   
   }
   
   public static String getSentimentString(String data) throws IOException, JSONException {
      String urlParameters  = "text=" + data;
      byte[] postData       = urlParameters.getBytes( Charset.forName( "UTF-8" ));
      int    postDataLength = postData.length;
      String request   = "http://text-processing.com/api/sentiment/";
      URL    url = new URL( request );
      HttpURLConnection cox = (HttpURLConnection) url.openConnection();           
      cox.setDoOutput( true );
      cox.setDoInput ( true );
      cox.setInstanceFollowRedirects( false );
      cox.setRequestMethod( "POST" );
      cox.setRequestProperty( "Content-Type", "application/x-www-form-urlencoded"); 
      cox.setRequestProperty( "charset", "utf-8");
      cox.setRequestProperty( "Content-Length", Integer.toString( postDataLength ));
      cox.setUseCaches( false );
      try( DataOutputStream wr = new DataOutputStream( cox.getOutputStream())) {
         wr.write( postData );
      }

      BufferedReader in = new BufferedReader(new InputStreamReader(cox.getInputStream()));
      String inputLine;
      StringBuffer response = new StringBuffer();

      while ((inputLine = in.readLine()) != null) {
         response.append(inputLine);
      }
      in.close();
      
      //print result
      return response.toString();
   }
   
}
	

Preliminary sketches
Next, I created a visualization based on the idea of a galaxy, where each solar system is a subreddit and each planet is a post. To make the visualization useful, I realized interaction was required, especially drilldown. I planned to color each subreddit and post to indicate sentiment, with positive sentiment appearing green, negative sentiment appearing red, and mixed sentiment appearing yellow. Clicking a subreddit should zoom in to show posts, and clicking a post will show the original text.

The first iteration used a static black background, and had simple patterns for the solar objects. The movement of objects wasn't tuned very finely, the text appeared cluttered, and the lack of hardware acceleration caused some jerky performance.










Process
For this project, I was highly inspired by previous work with using polar coordinates to create unique patterns using 2D drawing. The selection of color here is relatively simple, as is the shape of objects and text, to let the animation and motion take center stage.

The motion of each object is elegantly described in a single line, with an R and THETA parameter indicating each objects position at a given time. Additional radius and theta parameters nested in the original equation create the oscillation in the object's orbit, and noise parameters generated randomly deviate the object's path from a perfect circle

To obtain smooth motion, two vectors can be kept track of for each object: one where the object is SUPPOSED to be, and one where the object actually is. Then, the object is moved in the direction of the desired position, with speed proportional to the distance between the current position and desired position. This can be scaled with a constant. The result is that each object "settles" into place when moved. This technique was also used for other parameters, such as size and orbits, resulting in a smooth transition between states.

The illusion of depth is completed by the implementation of a z-coordinate for each object, and the use of a buffer to draw the objects in the order of their z-coordinates. Size of each object can be scaled with the z-coordinate, creating the illusion of distance.

Finally, the galaxy metaphor could only be completed by the presence of 'stars' in the background. For this, I extracted stopwords from the original post and drew words from the remaining distribution to generate keywords on the fly. A particle system was implemented to draw these keywords in the background. The alpha of each 'star' keyword pulses as it moves through the field of view.


Final result
Features were added to sort all posts/subreddits by sentiment for a more informative visualization. Reddit's sentiment towards the book seems varied, with an overall slightly negative tone (can be easily seen once the subreddits are sorted). Most users criticize 50SoG for poor writing, unconvincing characters, and an absurd plot. More conservative or religious individuals are concerned about the moral implications of the book, and generally speak very negatively about it.








Code

Code can be downloaded by clicking here: Controversy Galaxy Visualization. Processing, mysql, and Java were used for this project. Reddit and Sentiment analysis APIs were used.

Click a subreddit to see posts. Click a post to see text. Mousewheel to scroll. Pressing any key goes up one level.

Top level sliders and buttons can be used to customize the planetary motions, speed, and deviation from orbit.