Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques
Keywords:
Classification, Machine Learning, Social Media, Twitter DataAbstract
Having an average of five hundred million tweets sent out per day, twitter has become one of the largest platforms of data analysis for the researchers. Previously, various researches have been conducted on twitter data i.e., sentimental analysis. However, not much research has been done to classify the tweets in terms of categories so that tweets can be distributed as per user preferences. In this research we started by creating four broad categories: politics, sports, crime and natural. After that, we applied different machine learning techniques (Random Forest, K-Nearest Neighbors, Naïve Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we compared the results in terms of sensitivity, specificity, precision, false positive rate and accuracy. We found that Support Vector Machine (SVM) produced the best results in terms of sensitivity, specificity, precision, false positive rate and accuracy. Hence, we concluded that a machine learning approach (Support Vector Machine) can certainly be used to classify twitter data. Constructed dataset, all the programs, figures and snippets can be found at https://github.com/ananyasarkertonu/Twitter-Dataset
Downloads
References
Vishal A. Kharde, S.S. Sonawane, Sentiment Analysis of Twitter Data, International Journal of Computer Applications (0975 – 8887) Volume 139 – No.11, April 2016.
Neha Upadhyay1, Prof. Angad Singh2, Sentiment Analysis on Twitter by using Machine Learning Technique, International Journal for Research in Applied Science & Engineering Technology (IJRASET), Volume 4 Issue V, May 2016.
Ankita Gupta1, Jyotika Pruthi2, Neha Sahu, Sentiment Analysis of Tweets using Machine Learning Approach, IJCSMC, Vol. 6, Issue. 4, April 2017, pg.444 – 458.
K. Kaviya 1, K.K. Shanthini1, Dr.M. Sujithra2, “Micro-blogging Sentimental Analysis on Twitter Data Using Naïve Bayes Machine Learning Algorithm in Python”, International Journal on Future Revolution in Computer Science & Communication Engineering, Volume: 4 Issue: 4, April, 2018.
Bhagyashri Wagh1, J. V. Shinde2, N. R. Wankhade3, Sentiment Analysis on Twitter Data Using Naïve Bayes, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 5, Issue 12, December 2016.
Vikrant Hole1, Mukta Takalikar, Real Time Tweet Summarization and Sentiment Analysis of Game Tournament International Journal of Science and Research (IJSR), 2013.
Bharati S. Kannolli1, Prabhu R. Bevinmarad2 “Analysis and Prediction of Sentiments for Cricket Tweets Using Hadoop”, International Research Journal of Engineering and Technology (IRJET), Volume: 04 Issue: 10, oct 2017.
Ankita Rane1, Dr. Anand Kumar2, “Sentiment Classification System of Twitter Data for US Airline Service Analysis”, 42nd IEEE International Conference on Computer Software & Applications, 2018.
Nazim Razali1, Aida Mustapha1, Faiz Ahmad Yatim2, Ruhaya Ab Aziz1 “Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL)” International Research and Innovation Summit (IRIS 2017).