Quantum Aesthetics. Interdisciplinarity in Google Scholar Data
MAT 259, 2015
This project examines the literal "common ground" of quantum physics and aesthetics through evaluation of 2000 papers from Google Scholar. The most common keywords from the "quantum" papers set the terrain on which the "aesthetics" papers are mapped as a cloud of amplitudhedrons, a recently discovered jewel-like geometric object
that dramatically simplifies calculations of particle interactions. The size of each amplitudhedron is correlated to the number of matching keywords on the terrain, while its height is correlated to the corresponding paper's rank in the Google scholar search. The visualization runs in two different modes, an exploratory "quantum mode" and a "sketch mode" that gives easier access to the data.
Google Scholar, for obvious reasons (ads, licensing etc.), does not want to give individuals an easy API access to its data, not to speak of automated data crawlers ("bots"). This is why I built a custom Google Scholar scraper in Python that works in three steps:
1.) It sends an HTTP request for just one page of search results to the Google server via a randomly selected proxy server and with a randomly selected user agent in the request header.
2.) It parses the received HTML data, using the "beautifulsoup" library, for tags that were identified before by hand to enclose the data that we are looking for.
3.) It identifies the most common keywords for a single paper with the help of natural language processing methods, neatly stacks the data and writes it to a CSV file.
4.) It waits for a random amount of time, increases the results page number and then goes back to the first step until the maximum of 1000 results (a number hardcoded by Google into Google Scholar) is reached.
5.) It writes the accumulated, most common keywords to a different CSV file.
Alternatively, the Python script reads in a directory of HTML files that were saved by hand.
I started with just the "quantum mode" but decided that, for the visualization to provide useful results, another mode was necessary. For that mode, I worked in A4 (landscape), creating an "analog" interface that takes up graphical elements and color schemes from hand-written plans and diagrams (delicate lines, black and red ink on slightly toned down white paper) while still being three-dimensional, interactive, and animated. I deliberately included the possibility for text, set in a serifed typewriter font, to overlap, fade, and disappear.
Interestingly, the visualization actually allows to get to the quintessential "overlap" of quantum physics and aesthetics, which is the world of abstract reasoning. Shared high-scoring keywords like "theory", "investigation", "assume", "system", "analytic", and others are the literal "common ground" of both disciplines, their analytical framework for exploring their respective subjects. But the visualization also provides even more useful results in the form of single papers that deal with exactly this overlap, one example being "Philosophy of Quantum Mechanics" (M. Jammer, 1974), represented by one of the largest and highest-scoring amplituhedrons in the visualization. Finally, there are several other possible uses for the visualization's analytical back-end, namely looking at shared interests of different authors, or finding the "missing link" between two specific questions.
I used C++.
Control: Use the mouse to explore. Click on any amplitudhedron to show the related data. Press L to switch between sketch and quantum modes. Press S to switch terrain and cloud data sources.