A Robust Multi-Keyword Text Content Retrieval by Utilizing Hash Indexing

Authors

  • Mohamed Manzoor Ul Hassan Business Analyst, ATOS, Briggs & Stratton University, University of the Cumberlands, Milwaukee, Wisconsin, USA Author

Keywords:

Information Retrieval, Text Feature, Text Mining, Text Ontology

Abstract

Digital content on servers increase the  storage and fetching issues. So, researcher works in this  field to organize content for fast retrieval with data security.  This paper has worked on text digital content retrieval  available in form of documents, files. User can search a  desired file by test query and relevant list of files get  appeared. Keywords were fetched from the text content by  removing noisy data during pre-processing. Pre-processed  keywords are identified by the number known as term ID.  As per the term-ID each text content got a Hash Index  which was termed as key numbers in document index. Each  term or word has its own identification number known as  term Id , so privacy of comparing content terms and user  query maintain by hash based searching. As document  identification done by hash index key, so storage of text  content was done in encrypted numbers once document  select for reading then decryption of document applied for a  particular user. Experiment was done on real and artificial  text content dataset files on different topics. It was obtained  that proposed model of Hash indexing and tem based  retrieval has improved the privacy with relevancy of as per  query.  

Downloads

Download data is not yet available.

References

Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for textdocuments classification. Journal of Advances in Information Technology, 1, 4-20.

Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for textdocuments classification. Journal of Advances in Information Technology, 1, 4-20.

Brindha, S., Sukumaran, S., & Prabha, K. (2016). A survey on classification techniques for text mining. Proceedings of the 3rd International Conference on Advanced Computing and Communication Systems. IEEE. Coimbatore, India.

K. Sarkar and R. Law, ``A novel approach to document classi_cation using WordNet,'' CoRR, vol. 1, pp. 259_267, Oct. 2015. [Online].

Vasa, K. (2016). Text classification through statistical and machine learning methods: A survey. International Journal of Engineering Development and Research, 4, 655-658.

B.P.Yudha, and R. Sarrno. "Personality classification based on Twitter text using Naive Bayes, KNN and SVM," In Data and Software Engineering (ICoDSE), in proceedings od International Conference on, pp. 170-174. IEEE, 2015.

J. Santoso, E. M. Yuniarno, et al., "Large Scale Text Classification Using Map Reduce and Naive Bayes Algorithm for Domain Specified Ontology Building." In Intelligent Human-Machine Systems and Cybernetics (IHMSC), in proceedings of the 7th International Conference on, vol. 1, pp. 428-432. IEEE,2015.

B.Tang, H. He, et al., "A Bayesian classification approach using class-specific features for text categorization." IEEE Transactions on Knowledge and Data Engineering 28, pp: 1602-1606,no. 6, 2016.

A. Belmouhcine et M. Benkhalifa. “Implicit Links-Based Techniques to Enrich K-Nearest Neighbors and Naive Bayes Algorithms for Web Page Classification”. Springer International Publishing, 2016, vol. 403,.

G. Khade, S. Kumar, et S. Bhattacharya. “Classification of web pages on attractiveness: A supervised learning approach”. Intelligent Human Computer Interaction (IHCI), 2012.

Wenhai Sun, Bing Wang, Ning Cao, Ming Li, Wenjing Lou, Y. Thomas Hou And Hui Li . “Verifiable Privacy-Preserving Multi-Keyword Text Search In The Cloud Supporting Similarity-Based Ranking”. Ieee Transactions On Parallel And Distributed Systems, Vol. 25, No. 11, November 2014.

Alan Díaz-Manríquez , Ana Bertha Ríos-Alvarado, José Hugo Barrón-Zambrano, Tania Yukary Guerrero-Melendez, And Juan Carlos Elizondo-Leal. “An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy”. accepted March 9, 2018, date of publication March 15, 2018, date of current version May 9, 2018.

https://ijsret.com/2017/12/14/computer-science/

Published

2021-03-30

How to Cite

A Robust Multi-Keyword Text Content Retrieval by Utilizing Hash Indexing . (2021). International Journal of Innovative Research in Computer Science & Technology, 9(2), 1–5. Retrieved from https://acspublisher.com/journals/index.php/ijircst/article/view/11559