Zde se nacházíte:
Informace o publikaci
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
Autoři | |
---|---|
Rok publikování | 2017 |
Druh | Článek ve sborníku |
Konference | 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017 |
Fakulta / Pracoviště MU | |
Citace | |
www | http://ieeexplore.ieee.org/document/8051382/ |
Doi | http://dx.doi.org/10.1109/SEAA.2017.71 |
Klíčová slova | cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier |
Popis | Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers. |