Software Bug Reports: Automatic Keyword and Sentence-Based Text Summarization Using Artificial Intelligence
DOI:
https://doi.org/10.55524/Keywords:
Rapid Automatic Keyword Extraction, Text Summarization, Fuzzy C-Means, Bug Reports, Hierarchical Clustering, Rule EngineAbstract
The purpose of text summarization is to quickly and accurately extract the most important data from papers. The proposed unsupervised method seeks to synthesise complete and informative bug reports (software artefacts). The suggested approach employs Rapid Auto
matic Keyword Extraction and the term frequency-inverse document frequency method to identify applicable keywords and phrases. During the sentence extraction procedure, fuzzy C-means clustering is used to prioritise sentences that have a high degree of membership in each cluster (beyond a predefined threshold). The selection of sentences is performed by a rule-engine. Information is extracted using keywords and sentences chosen by the clustering process, and the rules are developed using domain knowledge. The proposed method produces a logical and well-organized summary of apache bug reports. The retrieval summary is improved with the help of hierarchical clustering by removing unnecessary details and rearranging them. The Apache Project Bug Report Corpus (APBRC) and the original Bug Report Corpus are used to evaluate the effectiveness of the proposed method. Measures of performance such as precision, recall, pyramid precision, and F-score are used to evaluate the results. Experiment results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline methods like BRC and LRCA. In addition, it achieves substantial gains compared to prior art unsupervised methods as Hurried and centroid. It extracts the most relevant keyword phrases and sentences from each cluster to offer comprehensive coverage and a coherent summary. The average values for precision, recall, f-score, and pyramid precision on the APBRC corpus are 78.22%, 82.18%, 80.10%, and 81.66%, respectively.
Downloads
References
K. Zechner, "Automatic summarization of open-domain multiparty dialogues in diverse genres", Comput. Linguistics, vol. 28, pp. 447-485, Dec. 2002.
L. Zhou, E. Hovy and M. Rey, "A Web-trained extraction summarization system", Proc. HLT-NAACL Conf., pp. 205- 211, May 2003.
X. Zhu and G. Penn, "Summarization of spontaneous conversations", Proc. 9th Int. Conf. Spoken Lang. Process., pp. 1531-1534, 2006.
G. Murray and G. Carenini, "Summarizing spoken and written conversations", Proc. Conf. Empirical Methods Natural Lang. Process. EMNLP, pp. 773-782, Oct. 2008.
O. Rambow and J. Chen, "Summarizing email threads", 2004.
S. Wan and K. McKeown, "Generating overview summaries of ongoing email thread discussions", Proc. 20th Int. Conf. Comput. Linguistics COLING, pp. 549, 2004.
X. Xia, D. Lo, E. Shihab and X. Wang, "Automated bug report field reassignment and refinement prediction", IEEE Trans. Rel., vol. 65, no. 3, pp. 1094-1113, Sep. 2016.
E. Hassan and T. Xie, "Software intelligence: The future of mining software engineering data", Proc. FSE/SDP workshop Future Softw. Eng. Res. FoSER, pp. 161-165, 2010.
T. Xie, S. Thummalapenta, D. Lo and C. Liu, "Data mining for software engineering", Computer, vol. 42, no. 8, pp. 55- 62, Aug. 2009.
T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo and C. Sun, "Duplicate bug report detection with a combination of
information retrieval and topic modeling", Proc. 27th IEEE/ACM Int. Conf. Automated Softw. Eng. ASE, pp. 70- 79, 2012.
Sun, D. Lo, X. Wang, J. Jiang and S.-C. Khoo, "A discriminative model approach for accurate duplicate bug report retrieval", Proc. 32nd ACM/IEEE Int. Conf. Softw. Eng. ICSE, pp. 45-54, 2010.
H. Mei and L. Zhang, "Can big data bring a breakthrough for software automation?", Sci. China Inf. Sci., vol. 61, no. 5, pp. 1-3, May 2018.
X. Xia, D. Lo, Y. Ding, J. M. Al-Kofahi, T. N. Nguyen and X. Wang, "Improving automated bug triaging with specialized topic model", IEEE Trans. Softw. Eng., vol. 43, no. 3, pp. 272-297, Mar. 2017.
T. Zhang, G. Yang, B. Lee and E. K. Lua, "A novel developer ranking algorithm for automatic bug triage using topic model and developer relations", Proc. 21st Asia– Pacific Softw. Eng. Conf., pp. 246-253, Dec. 2014.
J. Xuan, H. Jiang, H. Zhang and Z. Ren, "Developer recommendation on bug commenting: A ranking approach for the developer crowd", Sci. China Inf. Sci., vol. 60, Jul. 2017.
M. Rush, S. Chopra and J. Weston, "A neural attention model for abstractive sentence summarization", Proc. Conf. Empirical Methods Natural Lang. Process., pp. 379-389, 2015.
S. Chopra, M. Auli and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks", Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., pp. 93-98, 2016.
M. Mohd, R. Jan and M. Shah, "Text document summarization using word embedding", Expert Syst. Appl., vol. 143, Apr. 2020.
S. Rastkar, G. C. Murphy and G. Murray, "Automatic summarization of bug reports", IEEE Trans. Softw. Eng., vol. 40, no. 4, pp. 366-380, Apr. 2014.
H. Jiang, N. Nazar, J. Zhang, T. Zhang and Z. Ren, "PRST: A PageRank-based summarization technique for summarizing bug reports with duplicates", Int. J. Softw. Eng. Knowl. Eng., vol. 27, no. 6, pp. 869-896, Aug. 2017.
H. Jiang, X. Li, Z. Ren, J. Xuan and Z. Jin, "Toward better summarizing bug reports with crowdsourcing elicited attributes", IEEE Trans. Rel., vol. 68, no. 1, pp. 2-22, Mar. 2019.
S. Mani, R. Catherine, V. S. Sinha and A. Dubey, "AUSUM?: Approach for unsupervised bug report summarization", Proc. ACM SIGSOFT 20th Int. Symp. Found. Softw. Eng., pp. 1-11, Nov. 2012.
R. Lotufo, Z. Malik and K. Czarnecki, "Modelling the ‘hurried’ bug report reading process to summarize bug reports", Empirical Softw. Eng., vol. 20, no. 2, pp. 516-548, Apr. 2015.
K. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval", J. Document., vol. 28, no. 1, pp. 11-21, Jan. 1972.
G. Salton, A. Wong and C. S. Yang, "A vector space model for automatic indexing", Commun. ACM, vol. 18, no. 11, pp. 613-620, Nov. 1975.
D. Engel, "Mining for Emerging Technologies Within Text Streams and Documents", Proc. Int. Conf. Data Mining. Soc. Ind. Appl. Math., pp. 1-18, Feb. 2009.
S. Rose, D. Engel, N. Cramer and W. Cowley, "CO RI automatic keyword extraction", Text Mining Appl. Theory, vol. 1, pp. 1-20, 2010.
D. Patel, S. Shah and H. Chhinkaniwala, "Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique", Expert Syst. Appl., vol. 134, pp. 167-177, Nov. 2019.
F. B. Goularte, S. M. Nassar, R. Fileto and H. Saggion, "A text summarization method based on fuzzy rules and
applicable to automated assessment", Expert Syst. Appl., vol. 115, pp. 264-275, Jan. 2019.
Kaur and S. G. Jindal, "Bug report collection system (BRCS)", Proc. 7th Int. Conf. Cloud Comput. Data Sci. Eng. Confluence, pp. 697-701, Jan. 2017.
S. Rose, D. Engel and N. Cramer, "Automatic keyword extraction from individual documents", Text Mining Appl. Theory, vol. 1, pp. 1-20, Mar. 2010.
F. Lobo, "Fuzzy c-means algorithm Fuzzy c-means algorithm".
S. K. Lakshmanaprabu, K. Shankar, D. Gupta, A. Khanna, J. J. P. C. Rodrigues, P. R. Pinheiro, et al., "Ranking analysis for online customer reviews of products using opinion mining with clustering", Complexity, vol. 2018, pp. 1-9, Sep. 2018.
A. Karami, A. Gangopadhyay, B. Zhou and H. Kharrazi, "Fuzzy approach topic discovery in health and medical corpora", Int. J. Fuzzy Syst., vol. 20, no. 4, pp. 1334-1345, Apr. 2018.
N. Statistical, S. Ncss and A. R. Reserved, "Fuzzy clustering".
N. Statistical, S. Ncss and A. R. Reserved, Hierarchical clustering /dendrograms.
C. Malika, N. Ghazzali, V. Boiteau and A. Niknafs, "NbClust: An R package for determining the relevant number of clusters in a data Set", . Stat. Softw., vol. 61, no. 6, pp. 1-36, 2014.
V. K. Gupta and T. J. Siddiqui, "Multi-document summarization using sentence clustering", Proc. 4th Int. Conf. Intell. Human Comput. Interact. (IHCI), pp. 1-5, Dec. 2012.