Flocking Simulation Based on Checkout Co-occurrency
MAT 259, 2023
Lu Yang

Concept
I want to create a dynamic self-organized flocking simulation based on books been checked out at the same time. I assume it would be interesting to see:
1. Book titles with a specific keyword may contain different Dewey classes and different subjects
2. Books checked out with these books may contain a broader range of Dewey classes and subjects
3. These books may aggregate at a different Dewey class than their designated ones

Query
With the above assumptions, I queried book titles with the keyword “architecture”, for its multiple meaning in different disciplines. For each book title, I also queried books that were checked in and out at the same time with it, to approximate relevant books that doesn’t have the keyword “architecture”.

SELECT

t1.bibNumber AS A_bib,
t1.title as A_title,
FLOOR(t1.deweyClass) AS A_dewey,
t3.subject AS A_subject,

t2.bibNumber AS B_bib,
t2.title as B_title,
FLOOR(t2.deweyClass) AS B_dewey,
t4.subject AS B_subject

FROM
(SELECT
bibNumber,
GROUP_CONCAT(subject
SEPARATOR ';') AS subject
FROM
spl_2016.subject
WHERE
subject REGEXP '^[0-9a-zA-Z .]+$'
-- filter out non-English subjects
-- The anchor ^ and $ ensure that you are matching the entire string and not part of it.
-- Next the character class [0-9a-zA-Z .] matches a single upper/lower case letter or a space or a period.
-- The + is the quantifier for one or more repetitions of the previous sub-regex.
-- so in this case it allows us to match one or more of either a period or a space or a upper/lower case letter.
GROUP BY bibNumber) AS t3,

(SELECT
bibNumber,
GROUP_CONCAT(subject
SEPARATOR ';') AS subject
FROM
spl_2016.subject
WHERE
subject REGEXP '^[0-9a-zA-Z .]+$'
-- filter out non-English subjects
-- The anchor ^ and $ ensure that you are matching the entire string and not part of it.
-- Next the character class [0-9a-zA-Z .] matches a single upper/lower case letter or a space or a period.
-- The + is the quantifier for one or more repetitions of the previous sub-regex.
-- so in this case it allows us to match one or more of either a period or a space or a upper/lower case letter.
GROUP BY bibNumber) AS t4,

spl_2016.inraw t1
INNER JOIN
spl_2016.inraw t2 ON t1.cout = t2.cout AND t1.cin = t2.cin
AND t1.title LIKE '%architecture%'
AND t1.bibNumber != t2.bibNumber
AND t1.deweyClass != ''
AND t2.deweyClass != ''
-- AND YEAR(t1.cout) > 2017

WHERE
t1.bibNumber = t3.bibNumber
AND t2.bibNumber = t4.bibNumber



From the queried data, I ran another analysis that records for each subject title appeared in the dataset:
1. Dewey classes of itself
2. Subjects that co-occurred with it within the same book title
3. Frequencies of their co-occurrency
4. Dewey classes of the co-occurred books
5. Subjects of co-occurred books
6. Frequencies of their co-occurrency




Process
The flocking simulation does not intend to deliver a result right at the beginning. Instead, it requires interaction with provided parameters and observation through its self-organized forms. Different parameter setups could lead to different results.
The flocking system contains two components:
I. static points, which represent Dewey classe
Position
first three digits of Dewey classes(abc.def)
a -> point.x, b -> point.y, c -> point.z
Scale
determined by number of subjects that belongs to the dewey class
Attraction force
create attraction force for agents that belongs to this dewey class


II. Swarm agents that represent subjects
These points are initiated with random locations and random flaying directions. When meet another agent or a Dewey class point within its search distance, it will check whether they are co-currently or directly related and how strong the connections are, then calculate vector of its next movement based on flocking principles of alignment, cohesion, and separation
Separation force
steer away from agents that don’t have any relation with itself
Alignment force
align with the velocity vector of agents that have co-occurrent relation with itself
Cohesion force
cohere to the flocking center of all related agents nearby
Random force
random velocity vector to the next movement


The visualization of agents consists of following components:
Path
a path line formed by previous position points, color ranges from black to red, the closer the agent get to dewey class points it belongs, the more red it become, older position points will fade out


Connection to dewey class
connection line from agent to dewey classe points it belongs, color ranges from black to red, the closer the agent get to dewey class points it belongs, the more red it become


Connection to co-occurrent related agents
connection line from the agent to surrounding agents that are co-occurrent related, color averaged from the color of connection points on both ends



Final result
start



end




Code
All work is developed within Processing
Source Code + Data