Using Big Data Technique for Building Edit Alert System for Wikipedia Infoboxes Based on Map Reduce Method
Keywords:
Big Data, Map-Reduce, Wikipedia Infobox, update alert, bipartite graphAbstract
Wikipedia is an online encyclopedia and has become a vital information resource for users as well as for many knowledge bases derived from it. This information requires manual editing for update. Wikipedia provides an infobox on the right hand side of many articles. An infobox of a Wikipedia article generally contains key facts in thearticle and is organized as attribute-value pairs. All the Wikipedia’s content is manually updated or maintained by contributors. This leads to the fact that its information is not updated regularly and completely. In this paper, we present a novel system that focuses onprediction of data items that are most likely to be updated, based on the category of page, record key, last time updated, etc. for alerting Wikipedia editors, about the data items that might need update soon, using Time series modeling. Concept of Bipartite graph is used to perform user based collaborative filtering to find similar editors who might be interested in editing the infobox. The update alert is sent to editors found using Bipartite graph along with the past editors of a particular infobox. The technique to deal with vandalic and erroneous edits is also discussed and its analysis is given. We have also presented various tasks that can be carried out on infoboxes.
Downloads
References
K. Mahesh, S. Nirenburg, “Knowledge-based systems for natural language processing,” New Mexico State University, Computing Research Laboratory, 1996.
V. Nastase, M. Strube, “Transforming Wikipedia into a large scale multilingual concept network,” Artificial Intelligence, 194, 2013, pp.62-85.
M.Synak, M. Dabrowski, S.R.Kruk, “Semantic web and ontologies,” In Semantic Digital Libraries, 2009, Springer, Berlin, Heidelberg, pp. 41-54.
A. Di Iorio, A. Musetti, S. Peroni, F. Vitali, “Ontology driven generation of wiki content and interfaces,” New Review of Hypermedia and Multimedia, 16(1-2), 2010, pp. 9-31.
A. Tahri, O. Tibermacine, “DBPedia based factoid question answering system. International Journal of Web & Semantic Technology,” Vol. 4(3), 2013, p.23.
Chen, Danqi, Adam Fisch, Jason Weston, and Antoine Bordes. "Reading wikipedia to answer open-domain questions." arXiv preprint arXiv:1704.00051 (2017).
K. Smets, B. Goethals, B. Verdonk, “Automatic vandalism detection in Wikipedia: Towards a machine learning approach,” in AAAI workshop on Wikipedia and artificial intelligence: An Evolving Synergy, July, 2008, pp. 43-48.
E. Alfonseca, G. Garrido, J.Y.Delort, A. Peñas, “WHAD: Wikipedia historical attributes data,” Language Resources and Evaluation, 47(4), Springer, 2013, pp.1163- 1190.
E.W.Weisstein, Complete Bipartite Graph, 2002.
Z.D. Zhao, M.S. Shang, “User-based collaborative filtering recommendation algorithms on hadoop,” In Third International Conference on Knowledge Discovery and Data Mining (WKDD'10), IEEE, Jan 2010, pp. 478-481.