You are here:
Publication details
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
Authors | |
---|---|
Year of publication | 2017 |
Type | Article in Proceedings |
Conference | 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017 |
MU Faculty or unit | |
Citation | |
Web | http://ieeexplore.ieee.org/document/8051382/ |
Doi | http://dx.doi.org/10.1109/SEAA.2017.71 |
Keywords | cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier |
Description | Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers. |