Software Bug Reports: Automatic Keyword and Sentence-Based Text Summarization Using Artificial Intelligence

Authors

  • Zaid Altaf M. Tech Scholar, Department of Computer Science & Engineering, RIMT University, Mandi Gobindgarh, Punjab, India Author
  • Ashish Oberoi Assistant Professor, Department of Computer Science & Engineering, RIMT University, Mandi Gobindgarh, Punjab, India Author

DOI:

https://doi.org/10.55524/

Keywords:

Rapid Automatic Keyword Extraction, Text Summarization, Fuzzy C-Means, Bug Reports, Hierarchical Clustering, Rule Engine

Abstract

The purpose of text summarization is to  quickly and accurately extract the most important data  from papers. The proposed unsupervised method seeks to  synthesise complete and informative bug reports (software  artefacts). The suggested approach employs Rapid Auto 

matic Keyword Extraction and the term frequency-inverse  document frequency method to identify applicable  keywords and phrases. During the sentence extraction  procedure, fuzzy C-means clustering is used to prioritise  sentences that have a high degree of membership in each  cluster (beyond a predefined threshold). The selection of  sentences is performed by a rule-engine. Information is  extracted using keywords and sentences chosen by the  clustering process, and the rules are developed using  domain knowledge. The proposed method produces a  logical and well-organized summary of apache bug  reports. The retrieval summary is improved with the help  of hierarchical clustering by removing unnecessary details  and rearranging them. The Apache Project Bug Report  Corpus (APBRC) and the original Bug Report Corpus are  used to evaluate the effectiveness of the proposed method.  Measures of performance such as precision, recall,  pyramid precision, and F-score are used to evaluate the  results. Experiment results demonstrate that our proposed  method significantly outperforms the state-of-the-art  baseline methods like BRC and LRCA. In addition, it  achieves substantial gains compared to prior art  unsupervised methods as Hurried and centroid. It extracts  the most relevant keyword phrases and sentences from  each cluster to offer comprehensive coverage and a  coherent summary. The average values for precision,  recall, f-score, and pyramid precision on the APBRC  corpus are 78.22%, 82.18%, 80.10%, and 81.66%,  respectively. 

Downloads

Download data is not yet available.

References

K. Zechner, "Automatic summarization of open-domain multiparty dialogues in diverse genres", Comput. Linguistics, vol. 28, pp. 447-485, Dec. 2002.

L. Zhou, E. Hovy and M. Rey, "A Web-trained extraction summarization system", Proc. HLT-NAACL Conf., pp. 205- 211, May 2003.

X. Zhu and G. Penn, "Summarization of spontaneous conversations", Proc. 9th Int. Conf. Spoken Lang. Process., pp. 1531-1534, 2006.

G. Murray and G. Carenini, "Summarizing spoken and written conversations", Proc. Conf. Empirical Methods Natural Lang. Process. EMNLP, pp. 773-782, Oct. 2008.

O. Rambow and J. Chen, "Summarizing email threads", 2004.

S. Wan and K. McKeown, "Generating overview summaries of ongoing email thread discussions", Proc. 20th Int. Conf. Comput. Linguistics COLING, pp. 549, 2004.

X. Xia, D. Lo, E. Shihab and X. Wang, "Automated bug report field reassignment and refinement prediction", IEEE Trans. Rel., vol. 65, no. 3, pp. 1094-1113, Sep. 2016.

E. Hassan and T. Xie, "Software intelligence: The future of mining software engineering data", Proc. FSE/SDP workshop Future Softw. Eng. Res. FoSER, pp. 161-165, 2010.

T. Xie, S. Thummalapenta, D. Lo and C. Liu, "Data mining for software engineering", Computer, vol. 42, no. 8, pp. 55- 62, Aug. 2009.

T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo and C. Sun, "Duplicate bug report detection with a combination of

information retrieval and topic modeling", Proc. 27th IEEE/ACM Int. Conf. Automated Softw. Eng. ASE, pp. 70- 79, 2012.

Sun, D. Lo, X. Wang, J. Jiang and S.-C. Khoo, "A discriminative model approach for accurate duplicate bug report retrieval", Proc. 32nd ACM/IEEE Int. Conf. Softw. Eng. ICSE, pp. 45-54, 2010.

H. Mei and L. Zhang, "Can big data bring a breakthrough for software automation?", Sci. China Inf. Sci., vol. 61, no. 5, pp. 1-3, May 2018.

X. Xia, D. Lo, Y. Ding, J. M. Al-Kofahi, T. N. Nguyen and X. Wang, "Improving automated bug triaging with specialized topic model", IEEE Trans. Softw. Eng., vol. 43, no. 3, pp. 272-297, Mar. 2017.

T. Zhang, G. Yang, B. Lee and E. K. Lua, "A novel developer ranking algorithm for automatic bug triage using topic model and developer relations", Proc. 21st Asia– Pacific Softw. Eng. Conf., pp. 246-253, Dec. 2014.

J. Xuan, H. Jiang, H. Zhang and Z. Ren, "Developer recommendation on bug commenting: A ranking approach for the developer crowd", Sci. China Inf. Sci., vol. 60, Jul. 2017.

M. Rush, S. Chopra and J. Weston, "A neural attention model for abstractive sentence summarization", Proc. Conf. Empirical Methods Natural Lang. Process., pp. 379-389, 2015.

S. Chopra, M. Auli and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks", Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., pp. 93-98, 2016.

M. Mohd, R. Jan and M. Shah, "Text document summarization using word embedding", Expert Syst. Appl., vol. 143, Apr. 2020.

S. Rastkar, G. C. Murphy and G. Murray, "Automatic summarization of bug reports", IEEE Trans. Softw. Eng., vol. 40, no. 4, pp. 366-380, Apr. 2014.

H. Jiang, N. Nazar, J. Zhang, T. Zhang and Z. Ren, "PRST: A PageRank-based summarization technique for summarizing bug reports with duplicates", Int. J. Softw. Eng. Knowl. Eng., vol. 27, no. 6, pp. 869-896, Aug. 2017.

H. Jiang, X. Li, Z. Ren, J. Xuan and Z. Jin, "Toward better summarizing bug reports with crowdsourcing elicited attributes", IEEE Trans. Rel., vol. 68, no. 1, pp. 2-22, Mar. 2019.

S. Mani, R. Catherine, V. S. Sinha and A. Dubey, "AUSUM?: Approach for unsupervised bug report summarization", Proc. ACM SIGSOFT 20th Int. Symp. Found. Softw. Eng., pp. 1-11, Nov. 2012.

R. Lotufo, Z. Malik and K. Czarnecki, "Modelling the ‘hurried’ bug report reading process to summarize bug reports", Empirical Softw. Eng., vol. 20, no. 2, pp. 516-548, Apr. 2015.

K. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval", J. Document., vol. 28, no. 1, pp. 11-21, Jan. 1972.

G. Salton, A. Wong and C. S. Yang, "A vector space model for automatic indexing", Commun. ACM, vol. 18, no. 11, pp. 613-620, Nov. 1975.

D. Engel, "Mining for Emerging Technologies Within Text Streams and Documents", Proc. Int. Conf. Data Mining. Soc. Ind. Appl. Math., pp. 1-18, Feb. 2009.

S. Rose, D. Engel, N. Cramer and W. Cowley, "CO RI automatic keyword extraction", Text Mining Appl. Theory, vol. 1, pp. 1-20, 2010.

D. Patel, S. Shah and H. Chhinkaniwala, "Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique", Expert Syst. Appl., vol. 134, pp. 167-177, Nov. 2019.

F. B. Goularte, S. M. Nassar, R. Fileto and H. Saggion, "A text summarization method based on fuzzy rules and

applicable to automated assessment", Expert Syst. Appl., vol. 115, pp. 264-275, Jan. 2019.

Kaur and S. G. Jindal, "Bug report collection system (BRCS)", Proc. 7th Int. Conf. Cloud Comput. Data Sci. Eng. Confluence, pp. 697-701, Jan. 2017.

S. Rose, D. Engel and N. Cramer, "Automatic keyword extraction from individual documents", Text Mining Appl. Theory, vol. 1, pp. 1-20, Mar. 2010.

F. Lobo, "Fuzzy c-means algorithm Fuzzy c-means algorithm".

S. K. Lakshmanaprabu, K. Shankar, D. Gupta, A. Khanna, J. J. P. C. Rodrigues, P. R. Pinheiro, et al., "Ranking analysis for online customer reviews of products using opinion mining with clustering", Complexity, vol. 2018, pp. 1-9, Sep. 2018.

A. Karami, A. Gangopadhyay, B. Zhou and H. Kharrazi, "Fuzzy approach topic discovery in health and medical corpora", Int. J. Fuzzy Syst., vol. 20, no. 4, pp. 1334-1345, Apr. 2018.

N. Statistical, S. Ncss and A. R. Reserved, "Fuzzy clustering".

N. Statistical, S. Ncss and A. R. Reserved, Hierarchical clustering /dendrograms.

C. Malika, N. Ghazzali, V. Boiteau and A. Niknafs, "NbClust: An R package for determining the relevant number of clusters in a data Set", . Stat. Softw., vol. 61, no. 6, pp. 1-36, 2014.

V. K. Gupta and T. J. Siddiqui, "Multi-document summarization using sentence clustering", Proc. 4th Int. Conf. Intell. Human Comput. Interact. (IHCI), pp. 1-5, Dec. 2012.

Downloads

Published

2022-11-30

How to Cite

Software Bug Reports: Automatic Keyword and Sentence-Based Text Summarization Using Artificial Intelligence . (2022). International Journal of Innovative Research in Computer Science & Technology, 10(6), 101–109. https://doi.org/10.55524/