Perfect title (3D)
MAT 259, 2016
Niklas Griessbaum
Concept
And what words should a perfect title contain?
Query
I wanted to find the best word sequence to form a sucessful title in respect to a dewey class.
I therefore first searched for the two most popular first words in a title. As popularity, I defined how many items were checked out with
a certain word as the first word in the title.
I then looked for the two most popular words that follow these first words, and then for the two most popular words that follow each of those words.
I iterated that process for several iterations.
It appeared to be easier to (automatically) re-execute the a query with changing parameters than writing a potentially complicated SQL query that would get all the values at once.
The formatable query to retrieve the first two words:
- Code:
-
SELECT
COUNT(id) AS checkouts,
LEFT(deweyClass, 2) AS deweySub,
{wordNum},
SUBSTRING_INDEX(SUBSTRING_INDEX(title, ' ', {wordNum}),' ', -1) as word
FROM
spl_2016.inraw
WHERE
LEFT(deweyClass, 2) = '{dewey}'
GROUP BY
word
ORDER BY
checkouts DESC
LIMIT
5
And the formatable query to retrieve each subsequent words:
-
- Code:
SELECT
LEFT(deweyClass, 1) as dewey,
LOWER('{word}') as parent,
COUNT(id) as checkouts,
SUBSTRING_INDEX(SUBSTRING_INDEX(LOWER(title), '{word} ', -1),' ', 1) as nextWord
FROM
spl_2016.inraw
WHERE
LEFT(deweyClass, 1) = "{dewey}"
AND
cout > "{startDate}"
AND
title LIKE '%{word} %'
GROUP BY
nextWord
ORDER BY
checkouts DESC
LIMIT
2;
Final result
Evaluation/Analysis
The visualization and the mechanics work and are somewhat pleasing. However, it is not trivial to balance out the forces in order to have
the connected nodes appear with an acceptable spacing and centered in the scope of the camera.
Code