Stock Prices
MAT 259, 2021
Zhuowei

Concept
I have been really interested in the stock market recently. I’ve heard people saying things like ‘Buy the rumor, sell the news.’ Or ‘Price drops with good earnings and goes up with bad earnings.’, which is kind of counterintuitive. For this project, I’m interested in looking at how news/earnings reports/social media popularity actually affect the stock price.

Query
We will use historical stock price data for all stocks in s&p 500; news data; earning data; social media data. Stock data is usually used for stock price prediction. Combine with news sentiment, it could potentially help the prediction. Screenshots of some of the dataset are attached.Earning data shows: symbol: ticker of stock date: earnings date qtr: which quarter’s earning is reported eps_est: earnings per share estimated eps: earnings per share actually release_time: the time of the report release: post means after market closes, pre means before market opens For social media data, I am thinking about getting reddit data. Some of the important variables in the dataset: Title: title of the posts, where we can parse out the ticker in discussion Upvote_ratio: upvote/downvote ratio, where we get the sentiment towards this post Total comments: how many comments are under this post, which allow us to know the popularity of this post I’m still working on the news data. The idea is to scrape the news title and do sentiment analysis on the title to know if they are positive or negative news. And then we will use that for our visualization.






Preliminary sketches
Design: Use spy return as the baseline and plot the relative return for all 500 stocks in s&p 500 around spy. When there’s a news/earnings event/social media discussion, a point with different color will be drawn for that stock at that date. Different color represents the degree of different event. For example, green means good news, blue means bad news and the color will be in gradient to represent different degrees of good or bad news. I also want to show the aggregated results across all stocks for the entire time period of the data. The idea is to show the return of the stocks 1-5 days after the events. We will categorize the events based on their degree(good news->bad news; good earning->bad earing; popular->not so popular) And for each category, we will show the distribution of the return with boxplots. Sketch is attached.




Process



Final result



Code