Bug Severity Prediction using Keywords in Imbalanced Learning Environment

Full Text (PDF, 216KB), PP.53-60

Views: 0 Downloads: 0


Jayalath Ekanayake 1,*

1. Dept. of Computer Science and Informatics, Faculty of Applied Sciences, Uva Wellassa University, Badulla, 90000, Sri Lanka

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2021.03.04

Received: 22 Oct. 2020 / Revised: 26 Dec. 2020 / Accepted: 25 Feb. 2021 / Published: 8 Jun. 2021

Index Terms

Bug reports classification, bug severity level, topics modeling, candidate keywords, classification algorithms


Reported bugs of software systems are classified into different severity levels before fixing them. The number of bug reports may not be equally distributed according to the severity levels of bugs. However, most of the severity prediction models developed in the literature assumed that the underlying data distribution is evenly distributed, which may not correct at all instances and hence, the aim of this study is to develop bug classification models from unevenly distributed datasets and tested them accordingly.
To that end first, the topics or keywords of developer descriptions of bug reports are extracted using Rapid Keyword Extraction (RAKE) algorithm and then transferred them into numerical attributes, which combined with severity levels constructs datasets. These datasets are used to build classification models; Naïve Bayes, Logistic Regression, and Decision Tree Learner algorithms. The models’ prediction quality is measured using Area Under Recursive Operative Characteristics Curves (AUC) as the models learnt from more skewed environments.
According to the results, the prediction quality of the Logistics Regression model is 0.65 AUC whereas the other two models recorded maximum 0.60 AUC. Though the datasets contain comparatively less number of instances from the high severity classes; Blocking and High, the Logistic Regression models predict the two classes with a decent AUC value of 0.65 AUC. Hence, this projects shows that the models can be trained from highly skewed datasets so that the models prediction quality is equally well over all the classes regardless of number of instances representing the class. Further, this project emphasizes that the models should be evaluated using the appropriate metrics when the models are trained from imbalance learning environments. Also, this work uncovers that the Logistic Regression model is also capable of classifying documents as Naïve Bayes, which is well known for this task.

Cite This Paper

Jayalath Ekanayake, "Bug Severity Prediction using Keywords in Imbalanced Learning Environment", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.3, pp.53-60, 2021. DOI:10.5815/ijitcs.2021.03.04


[1] Xie T, Zhang L, Xiao X, Xiong YF, Hao D. Cooperative software testing and analysis: Advances and challenges. Journal of Computer Science and Technology. 2014 Jul 1;29(4):713-23.
[2] Xia X, Lo D, Wen M, Shihab E, Zhou B. An empirical study of bug report field reassignment. In2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) 2014 Feb 3 (pp. 174-183). IEEE.
[3] Wang J, Wang S, Cui Q, Wang Q. Local-based active classification of test report to assist crowdsourced testing. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering 2016 Aug 25 (pp. 190-201).
[4] Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd international conference on Machine learning 2006 Jun 25 (pp. 233-240).
[5] Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering. 2008 May 23;34(4):485-96.
[6] Provost F, Fawcett T. Robust classification for imprecise environments. Machine learning. 2001 Mar 1;42(3):203-31.
[7] Tian Y, Lo D, Sun C. Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In2012 19th Working Conference on Reverse Engineering 2012 Oct 15 (pp. 215-224). IEEE.
[8] Roy NK, Rossi B. Towards an improvement of bug severity classification. In2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications 2014 Aug 27 (pp. 269-276). IEEE.
[9] Tan Y, Xu S, Wang Z, Zhang T, Xu Z, Luo X. Bug severity prediction using question-and-answer pairs from Stack Overflow. Journal of Systems and Software. 2020 Mar 2:110567.
[10] Sabor KK, Hamdaqa M, Hamou-Lhadj A. Automatic prediction of the severity of bugs using stack traces and categorical features. Information and Software Technology. 2020 Jul 1; 123:106205.
[11] Arokiam J, Bradbury JS. Automatically predicting bug severity early in the development process. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results 2020 Jun 27 (pp. 17-20).
[12] Kumari M, Singh UK, Sharma M. Entropy Based Machine Learning Models for Software Bug Severity Assessment in Cross Project Context. InInternational Conference on Computational Science and Its Applications 2020 Jul 1 (pp. 939-953). Springer, Cham.
[13] Kudjo PK, Chen J, Mensah S, Amankwah R, Kudjo C. The effect of Bellwether analysis on software vulnerability severity prediction models. Software Quality Journal. 2020 Jan 7:1-34.
[14] Kukkar A, Mohana R, Kumar Y. Does bug report summarization help in enhancing the accuracy of bug severity classification?. Procedia Computer Science. 2020 Jan 1; 167:1345-53.
[15] Kanwal J, Maqbool O. Bug prioritization to facilitate bug report triage. Journal of Computer Science and Technology. 2012 Mar 1;27(2):397-412.
[16] Alenezi M, Banitaan S. Bug reports prioritization: Which features and classifier to use?. In2013 12th International Conference on Machine Learning and Applications 2013 Dec 4 (Vol. 2, pp. 112-116). IEEE.
[17] Tian Y, Lo D, Xia X, Sun C. Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering. 2015 Oct 1;20(5):1354-83.
[18] Kumari M, Singh VB. An improved classifier based on entropy and deep learning for bug priority prediction. In International Conference on Intelligent Systems Design and Applications 2018 Dec 6 (pp. 571-580). Springer, Cham.
[19] Waqar A. Software Bug Prioritization in Beta Testing Using Machine Learning Techniques. Journal of Computers for Society 2020;1(1):24-34.
[20] Cheng X, Liu N, Guo L, Xu Z, Zhang T. Blocking Bug Prediction Based on XGBoost with Enhanced Features. In2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC) 2020 Jul 13 (pp. 902-911). IEEE.
[21] Sharma M, Kumari M, Singh VB. Bug Priority Assessment in Cross-Project Context Using Entropy-Based Measure. InAdvances in Machine Learning and Computational Intelligence 2020 (pp. 113-128). Springer, Singapore.
[22] Ekanayake, J.B., 2021. Predicting Bug Priority Using Topic Modelling in Imbalanced Learning Environments. International Journal of Systems and Service-Oriented Engineering (IJSSOE), 11(1), pp.31-42.
[23] Rose S, Engel D, Cramer N, Cowley W. Automatic keyword extraction from individual documents. Text mining: applications and theory. 2010 Mar 26; 1:1-20.
[24] Mihalcea R, Tarau P. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing 2004 Jul (pp. 404-411).
[25] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 2009 Nov 16;11(1):10-8.
[26] Mai Farag Imam, Amal Elsayed Aboutabl, Ensaf H. Mohamed, "Automating Text Simplification Using Pictographs for People with Language Deficits", International Journal of Information Technology and Computer Science, Vol.11, No.7, pp.26-34, 2019.
[27] Pierre MOUKELI MBINDZOUKOU, Arsè€ne Roland MOUKOUKOU, David NACCACHE, Nino TSKHOVREBASHVILI, "A Stochastic Model for Simple Document Processing", International Journal of Information Technology and Computer Science, Vol.11, No.7, pp.43-53, 2019.
[28] Ahmed Iqbal, Shabib Aftab, "Prediction of Defect Prone Software Modules using MLP based Ensemble Techniques", International Journal of Information Technology and Computer Science, Vol.12, No.3, pp.26-31, 2020.
[29] Ekanayake J, Tappolet J, Gall HC, Bernstein A. Time variance and defect prediction in software projects. Empirical Software Engineering. 2012 Aug;17(4):348-89.
[30] Ekanayake J, Tappolet J, Gall HC, Bernstein A. Tracking concept drift of software projects using defect prediction quality. In2009 6th IEEE International Working Conference on Mining Software Repositories 2009 May 16 (pp. 51-60). IEEE.