Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

Publication details

Authors	SINGHA ROY Nivir Kanti ROSSI Bruno
Year of publication	2017
Type	Article in Proceedings
Conference	43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017
MU Faculty or unit	Faculty of Informatics
Citation
Web	http://ieeexplore.ieee.org/document/8051382/
Doi	http://dx.doi.org/10.1109/SEAA.2017.71
Keywords	cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier
Description	Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.

10 reasons why you will fall in love with MU