Using Big Data Technique for Building Edit Alert System for Wikipedia Infoboxes Based on Map Reduce Method


  • Khushboo Bhatia Student, Computer Science and Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur,India, Author
  • Arnab Halder Student, Computer Science and Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur,India Author
  • Yashi Yadav Student, Computer Science and Engineering,Shri Ramdeobaba College of Engineering and Management ,Nagpur, India, Author
  • Ankush Sarsewar Student, Computer science and engineering (CSE), Shri Ramdeobaba College of Engineering andManagement, Nagpur, India, Author
  • Priyanka Singh Student,Computer Science and Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur,India, Author
  • Khushboo Khurana Assistant Professor, Computer Science and Engineering,Shri Ramdeobaba College of Engineering and Management,Nagpur, India, Author


Big Data, Map-Reduce, Wikipedia Infobox, update alert, bipartite graph


Wikipedia is an online encyclopedia and has  become a vital information resource for users as well as  for many knowledge bases derived from it. This  information requires manual editing for update. Wikipedia  provides an infobox on the right hand side of many  articles. An infobox of a Wikipedia article generally  contains key facts in thearticle and is organized as  attribute-value pairs. All the Wikipedia’s content is  manually updated or maintained by contributors. This  leads to the fact that its information is not updated  regularly and completely. In this paper, we present a novel  system that focuses onprediction of data items that are  most likely to be updated, based on the category of page,  record key, last time updated, etc. for alerting Wikipedia  editors, about the data items that might need update soon,  using Time series modeling. Concept of Bipartite graph is  used to perform user based collaborative filtering to find  similar editors who might be interested in editing the  infobox. The update alert is sent to editors found using  Bipartite graph along with the past editors of a particular  infobox. The technique to deal with vandalic and  erroneous edits is also discussed and its analysis is given. We have also presented various tasks that can be carried  out on infoboxes.  


Download data is not yet available.


K. Mahesh, S. Nirenburg, “Knowledge-based systems for natural language processing,” New Mexico State University, Computing Research Laboratory, 1996.

V. Nastase, M. Strube, “Transforming Wikipedia into a large scale multilingual concept network,” Artificial Intelligence, 194, 2013, pp.62-85.

M.Synak, M. Dabrowski, S.R.Kruk, “Semantic web and ontologies,” In Semantic Digital Libraries, 2009, Springer, Berlin, Heidelberg, pp. 41-54.

A. Di Iorio, A. Musetti, S. Peroni, F. Vitali, “Ontology driven generation of wiki content and interfaces,” New Review of Hypermedia and Multimedia, 16(1-2), 2010, pp. 9-31.

A. Tahri, O. Tibermacine, “DBPedia based factoid question answering system. International Journal of Web & Semantic Technology,” Vol. 4(3), 2013, p.23.

Chen, Danqi, Adam Fisch, Jason Weston, and Antoine Bordes. "Reading wikipedia to answer open-domain questions." arXiv preprint arXiv:1704.00051 (2017).

K. Smets, B. Goethals, B. Verdonk, “Automatic vandalism detection in Wikipedia: Towards a machine learning approach,” in AAAI workshop on Wikipedia and artificial intelligence: An Evolving Synergy, July, 2008, pp. 43-48.

E. Alfonseca, G. Garrido, J.Y.Delort, A. Peñas, “WHAD: Wikipedia historical attributes data,” Language Resources and Evaluation, 47(4), Springer, 2013, pp.1163- 1190.

E.W.Weisstein, Complete Bipartite Graph, 2002.

Z.D. Zhao, M.S. Shang, “User-based collaborative filtering recommendation algorithms on hadoop,” In Third International Conference on Knowledge Discovery and Data Mining (WKDD'10), IEEE, Jan 2010, pp. 478-481.




How to Cite

Using Big Data Technique for Building Edit Alert System for Wikipedia Infoboxes Based on Map Reduce Method . (2018). International Journal of Innovative Research in Computer Science & Technology, 6(4), 49-55.