This chapter discusses about the important work and procedures related to sentiment analysis and stock market prediction done previously. These researches and publications are related to my speculations and will further motivate with the end-goal.
The approach used in this thesis is inspired by Bollen et al’s strategy [12], with a step taken forward to implement PageRank algorithm to increase the accuracy of results and use of different sentiment analysis techniques than the techniques used by him. In 2010, Bollen used Twitter data for finding the predictability of Twitter sentiments on stock market with high accuracy. He proposed a method for prediction of the changes in the stock market price based on the mood of people on Twitter.
…show more content…
The second phase includes polarity classification in which corpus is classified into positive, negative, both or neutral using 10 different features and 2 simple classifiers. Neutral-polar classification and polarity classification achieved an average accuracy of 75.9% and 65.7%.
Besides understanding the effects of twitter data and sentiment analysis, a large group of researchers is attracted towards various types of predictions and their applications in various fields. Andrei Oghina [28] predicted Internet Movie Database (IMDB) ratings by information retrieval from social media. They hypothesized that the correlation can be found between the IMDB ratings and social media signals related to movie artifacts (e.g. movie title) . In 2010, Andranik et. al. [29 ] analyzed tweets which mentioned political parties and politician during German election 2009 and found that large number of tweets reflects voter preferences. Also, there are researches speculating the fact that Twitter can be used in areas like real-time event detection like earthquakes [30], tracking the spread of diseases [31].
There is also prior work in finding the correlations between social media and stock market. In 2010, Gilbert and Karahalios [32] estimated anxiety and fear from a dataset of over 20 million posts from LiveJournal site. They calculated an index of US national mood known as the Anxiety Index. Results of this experiment showed that when the Anxiety Index rose, the S&P 500 market ended
In this paper we will be looking at search engines, social media and blogging and try and way up
In the paper Nguyen et al \cite{Micro:Ng}, the model makes use of the Micro-reviews\cite{Micro:Ng} generated by the users through various social media sites about any particular entity. Micro-Reviews \cite{Micro:Ng} are the reviews that are not too long , easy to comprehend and also considered as the most appropriate feedback of the customer. But it is starting to get complicated as the number of Micro-Reviews \cite{Micro:Ng} are increasing and is hard to go through several thousands of the user reviews to find the best review suitable to the user preferences. In order to overcome this, these reviews are categorised in to either positive or negative feedbacks. Then this Micro-Reviews \cite{Micro:Ng} are associated and
Efficient capital market “It was generally believed that securities markets were extremely efficient in reflecting information about the stock market as a whole” (Fama 1970). To extent that when there is new information about stock rise, the news was dispersed immediately and it affects the security 's price at that time.
There is an abundant of literature can be found on the study of causal relationship between the macroeconomic variables and the stock prices with various scope and time period. Also majority of the study results depicted the association between the macro indicators and the stock market movement either in the forward or in the backward or also a bidirectional relationship. Some of the previous research works on this topic have been discussed and it is found that those conclusions supported the present study.
As in our study, LDA topics has improved accuracy of finding the keywords for different topics.In this work we examine the social aspects of food tweeting behavior, and provide some support to the social affinity that is not local in geographic sense. There have been several recent studies that probe the viability of public health surveillance by measuring relevant textual signals in social media.Prier, K.W.Smith, M.S.Giraud-Carrier, C. L. Hanson[5] examine all words people use in online reviews, and draw insights on correlating terms and concepts that may not seem immediately relevant to the hygiene status of restaurants. The work draws from the rich body of research that studies online reviews for sentiment analysis based on few research papers.
Sentiment analysis concentrates on attitudes, whereas the traditional method of text mining mainly focuses on the analysis of facts. There are few main fields of research in Sentiment analysis: sentiment classification, feature based sentiment classification and opinion summarization. Sentiment classification deals with classifying entire document according to the opinions shown with respect to a certain object. While feature-based sentiment classification considers the opinions on the features of a certain object. Opinion summarization task is different compared to traditional text summarization because only the features of the product are examined on which the customers have expressed their opinions via any social media. Opinion summarization does not summarize the reviews by selecting a subset or rewriting some of the original sentences from the reviews to capture the main points like in the classic text summarization
The main objective of this article is to summarize, evaluate, and offer a critical view on the paper of Baker & Wurgler (2007). The first section presents a review of article, the second discusses our main criticism on the econometric methodology, the third analyses predictive power of investor sentiment, and finally the conclusions.
IndexTerms— Location estimation, Traffic congestion, Tweet classification, Text mining, Twitter, Naïve Bayes Classifier, Real-time event detection.
First, we derive the collective opinion $O$ of a whole data sample $D$, using a SA method existing in the literature. Specifically, we adopt in this work the method SACI cite{jws2015}. SACI is relevant to our goal since it was originally proposed for estimating efficiently collective sentiment on data samples, instead of aggregating the sentiment derived for each individual document. Further, the authors demonstrated that SACI is more effective in estimating the collective opinion than aggregation-based SA methods. SACI represents $O$ as a distribution probability among the sentiment classes positive, negative and neutral. Thus, we split $D$ into time units of equal size (e.g., days, weeks, months). Then, we estimate the collective opinion $O_t$ using only the posts belonging to each distinct time unit $t$. Finally, we perform a visual inspection on the derived distributions. The more dynamic a domain, the more different are opinions estimated on distinct time units.looseness=-1
Sentiment analysis or opinion mining is an emerging area of research, because, the impact of the web is increasing at a very fast rate, now most of the people would like to share their opinions, feelings and experiences on the web.. Now people commonly use blogs, forums, e-news, reviews channels and the social networking platforms such as
The stock market movements are complex and difficult to predict with certainty. Strategies tend to work for a while before they fail terribly. Mathematical and statistical formulae don’t hold for long. This is mainly because some of the factors influencing the demand/supply of a stock cannot be quantified and may be poison distributed. Historical patterns are broken every now and then while fundamental data bears different effects on stocks at different times of an economic cycle. However, there appears to be a correlation between the stock market movements and the atmosphere in the social media. This is yet to be tested in this study, even though there are numerous studies concerning this topic. However, through the AP’s Twitter account,
Twitter is, without question, one of the most active, popular social platforms on the web.
In the first phase, the extraction patterns are applied on a large corpus of text obtained from twitter to yield a set of subjective terms. In the second phase, the extracted terms, are assigned polarity based on the normalized point wise mutual information score between them and positive and negative terms derived from an existing polarity lexicon. Details of the process are presented in the following subsections.
In the sentiment extraction phase, the candidate’s name extracted from the resume was used to search on Twitter to find his/her profile. The candidate’s tweets are retrieved for further processing using Tweepy package and its API [13]. Then, these tweets are analysed to extract the sentiments for each candidate. This is done by using Textblob package which is a popular natural language processer [14].
Opinion mining and sentiment analysis is discussed in chapter 7 of Amy Van Looy’s Social Media Management; Technologies and Strategies for Creating Business Value (pp.133-147) “Opinion mining and sentiment analysis can be defined as processing “a set of search results for a given item,