-
Optimizing Software Fault Prediction using Voting Ensembles in Class Imbalance Scenarios
- Ashu Mehta, Amandeep Kaur, and Navdeep Kaur
-
2024, 20(11):
676-687.
doi:10.23940/ijpe.24.11.p4.676687
-
Abstract
PDF (2497KB)
-
References |
Related Articles
Software defect prediction is greatly hampered by class imbalance, which frequently results in predictions that are biassed in favor of the majority class and inadequate identification of defective instances. This work assesses the usefulness of ensemble techniques in enhancing fault prediction accuracy and looks into how class imbalance affects the prediction performance of different machine learning classifiers. Group techniques like AdaBoost, GBoost, Stacking, and Voting were combined with four base classifiers: BNB, GNB, Random Forest, and SVM. With distinct assessments for defective (Class 1) and non-defective (Class 0) classes, the accuracy, precision, recall, and F1-score were used to compare the performance of these models. The research was carried out in two stages: first, ensemble approaches were tested on datasets that were unbalanced, and then PCA was integrated for feature reduction. According to the results, the Voting ensemble method worked better than the other approaches, regularly providing both classes with balanced recall and precision. Upon applying PCA, the Voting+PCA model exhibited noteworthy enhancements in all datasets (CM1, PC1, KC1, KC2), attaining superior accuracy and optimized metrics particular to each class. Interestingly, the Voting+PCA model performed better on all datasets and achieved accuracies of up to 98.20% for PC1. According to the results, combining PCA with ensemble models—in this case, the Voting method—improves predictive performance even when there is a class imbalance, making it a reliable strategy for predicting software defects.