Informace o publikaci

Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

Autoři

SINGHA ROY Nivir Kanti ROSSI Bruno

Rok publikování 2017
Druh Článek ve sborníku
Konference 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://ieeexplore.ieee.org/document/8051382/
Doi http://dx.doi.org/10.1109/SEAA.2017.71
Klíčová slova cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier
Popis Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info