Comparative Performance Evaluation of Linear, Bagging, and Boosting Models Using BorutaSHAP for Software Defect Prediction on NASA MDP Datasets
DOI:
https://doi.org/10.52436/1.jutif.2025.6.6.5393Keywords:
BorutaSHAP, Feature Selection, Machine Learning Ensembles, SMOTE, Software Defect Prediction, XAIAbstract
Software defect prediction aims to identify potentially defective modules early on in order to improve software reliability and reduce maintenance costs. However, challenges such as high feature dimensions, irrelevant metrics, and class imbalance often reduce the performance of prediction models. This research aims to compare the performance of three classification model groups—linear, bagging, and boosting—combined with the BorutaSHAP feature selection method to improve prediction stability and interpretability. A total of twelve datasets from the NASA Metrics Data Program (MDP) were used as test references. The research stages included data preprocessing, class balancing using the Synthetic Minority Oversampling Technique (SMOTE), feature selection with BorutaSHAP, and model training using five algorithms, namely Logistic Regression, Linear SVC, Random Forest, Extra Trees, and XGBoost. The evaluation was conducted with Stratified 5-Fold Cross-Validation using the F1-score and Area Under the Curve (AUC) metrics. The experimental results showed that tree-based ensemble models provided the most consistent performance, with Extra Trees recording the highest average AUC of 0.794 ± 0.05, followed by Random Forest (0.783 ± 0.06). The XGBoost model provided the best results on the PC4 dataset (AUC = 0.937 ± 0.008), demonstrating its ability to handle complex data patterns. These findings prove that BorutaSHAP is effective in filtering relevant features, improving classification reliability, and strengthening transparency and interpretability in the Explainable Artificial Intelligence (XAI) framework for software quality improvement.
Downloads
References
M. Singh and J. K. Chhabra, “Machine learning based improved cross-project software defect prediction using new structural features in object oriented software,” Appl Soft Comput, vol. 165, no. July, p. 112082, 2024, doi: 10.1016/j.asoc.2024.112082.
V. K. Kumar and P. V. Sagar, “Knowledge-Based Systems An optimal feature selection based hybrid intelligent model for software defect prediction,” Knowl Based Syst, vol. 328, no. July, p. 114146, 2025, doi: 10.1016/j.knosys.2025.114146.
S. Haldar and L. F. Capretz, “Interpretable Software Defect Prediction from Project Effort and Static Code Metrics,” Computers, vol. 13, no. 2, pp. 1–23, 2024, doi: 10.3390/computers13020052.
S. R. Goyal, “Results in Engineering Review article A systematic review on AI based class imbalance handling in software defect prediction,” Results in Engineering, vol. 27, no. June, p. 106578, 2025, doi: 10.1016/j.rineng.2025.106578.
Y. Ding et al., “Metric information mining with metric attention to boost software defect prediction performance,” Sci Comput Program, vol. 248, no. June 2025, p. 103381, 2025, doi: 10.1016/j.scico.2025.103381.
N. S. Thomas and S. Kaliraj, “An Improved and Optimized Random Forest Based Approach to Predict the Software Faults,” SN Comput Sci, vol. 5, no. 5, 2024, doi: 10.1007/s42979-024-02764-x.
T. Shahzad, S. Khan, T. Mazhar, W. Ahmad, K. Ouahada, and H. Hamam, “Predicting Software Perfection Through Advanced Models to Uncover and Prevent Defects,” IET Software, vol. 2025, no. 1, 2025, doi: 10.1049/sfw2/8832164.
M. Mustaqeem, S. Mustajab, M. Alam, F. Jeribi, S. Alam, and M. Shuaib, A trustworthy hybrid model for transparent software defect prediction: SPAM-XAI, vol. 19, no. 7 July. 2024. doi: 10.1371/journal.pone.0307112.
H. Shi, J. Ai, J. Liu, and J. Xu, “Improving Software Defect Prediction in Noisy Imbalanced Datasets,” Applied Sciences (Switzerland), vol. 13, no. 18, 2023, doi: 10.3390/app131810466.
A. Daza, G. Apaza-perez, K. Samanez-torres, J. Benites-noriega, O. Llanos, and P. C. Condori-cutipa, “Industrial applications of artificial intelligence in software defects prediction : Systematic review , challenges , and future works,” Computers and Electrical Engineering, vol. 124, no. PB, p. 110411, 2025, doi: 10.1016/j.compeleceng.2025.110411.
W. N. Hidayatullah, R. Herteno, M. R. Faisal, R. A. Nugroho, S. W. Saputro, and Z. Bin Akhtar, “A Comparative Analysis of Polynomial-fit-SMOTE Variations with Tree-Based Classifiers on Software Defect Prediction,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 3, pp. 289–301, 2024, doi: 10.35882/jeeemi.v6i3.455.
Y. Al-Smadi, M. Eshtay, A. Al-Qerem, S. Nashwan, O. Ouda, and A. A. Abd El-Aziz, “Reliable prediction of software defects using Shapley interpretable machine learning models,” Egyptian Informatics Journal, vol. 24, no. 3, p. 100386, 2023, doi: 10.1016/j.eij.2023.05.011.
A. Jude and J. Uddin, “Explainable Software Defects Classification Using SMOTE and Machine Learning,” Annals of Emerging Technologies in Computing, vol. 8, no. 1, pp. 35–49, 2024, doi: 10.33166/AETiC.2024.01.00.
Y. Liu, W. Zhang, G. Qin, and J. Zhao, “A comparative study on the effect of data imbalance on software defect prediction,” Procedia Comput Sci, vol. 214, no. C, pp. 1603–1616, 2022, doi: 10.1016/j.procs.2022.11.349.
H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, “Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods,” J Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00905-w.
C. Sebastián and C. E. González-Guillén, “A feature selection method based on Shapley values robust for concept shift in regression,” Neural Comput Appl, vol. 36, no. 23, pp. 14575–14597, 2024, doi: 10.1007/s00521-024-09745-4.
M. Rotari and M. Kulahci, “Variable selection wrapper in presence of correlated input variables for random forest models,” Qual Reliab Eng Int, vol. 40, no. 1, pp. 297–312, 2024, doi: 10.1002/qre.3398.
G. Yue, “Screening of lung cancer serum biomarkers based on Boruta-shap and RFC-RFECV algorithms,” J Proteomics, vol. 301, no. 1, p. 105180, 2024, doi: 10.1016/j.jprot.2024.105180.
M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, “Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning,” PeerJ Comput Sci, vol. 10, pp. 1–37, 2024, doi: 10.7717/peerj-cs.1860.
T. Zivkovic, B. Nikolic, V. Simic, D. Pamucar, and N. Bacanin, “Software defects prediction by metaheuristics tuned extreme gradient boosting and analysis based on Shapley Additive Explanations,” Appl Soft Comput, vol. 146, p. 110659, 2023, doi: 10.1016/j.asoc.2023.110659.
Z. Huang, H. Yu, G. Fan, Z. Shao, M. Li, and Y. Liang, “Aligning XAI explanations with software developers’ expectations: A case study with code smell prioritization,” Expert Syst Appl, vol. 238, p. 121640, 2023, doi: 10.1016/j.eswa.2023.121640.
U. Ahmed et al., “Hybrid bagging and boosting with SHAP based feature selection for enhanced predictive modeling in intrusion detection systems,” Sci Rep, vol. 14, no. 1, pp. 1–32, 2024, doi: 10.1038/s41598-024-81151-1.
W. Albattah and M. Alzahrani, “Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach,” AI (Switzerland), vol. 5, no. 4, pp. 1743–1758, 2024, doi: 10.3390/ai5040086.
T. Bayramova, “Software Defect Prediction Using the Machine Learning Methods,” Problems of Information Technology, vol. 14, no. 2, pp. 23–31, 2023, doi: 10.25045/jpit.v14.i2.03.
R. van Dinter, C. Catal, G. Giray, and B. Tekinerdogan, “Just-in-time defect prediction for mobile applications: using shallow or deep learning?,” Software Quality Journal, vol. 31, no. 4, pp. 1281–1302, 2023, doi: 10.1007/s11219-023-09629-1.
T. Li, Z. Wang, and P. Shi, “Within-project and cross-project defect prediction based on model averaging,” Sci Rep, vol. 15, no. 1, pp. 1–17, 2025, doi: 10.1038/s41598-025-90832-4.
D. P. Gottumukkala, P. R. Prasad, and S. K. Rao, “Topic modeling-based prediction of software defects and root cause using BERTopic, and multioutput classifier,” Sci Rep, vol. 15, no. 1, pp. 1–20, 2025, doi: 10.1038/s41598-025-11458-0.
X. Huang and J. Marques-Silva, “On the failings of Shapley values for explainability,” International Journal of Approximate Reasoning, vol. 171, pp. 1–57, 2024, doi: 10.1016/j.ijar.2023.109112.
A. B. Nasser et al., “Depth linear discrimination-oriented feature selection method based on adaptive sine cosine algorithm for software defect prediction,” Expert Syst Appl, vol. 253, no. February, p. 124266, 2024, doi: 10.1016/j.eswa.2024.124266.
P. Yuen, P. Chan, J. Keung, and Z. Yang, “The Journal of Systems & Software Identifying inconsistent software defect predictions with symmetry metamorphic relation pattern,” J Syst Softw, vol. 227, no. February, p. 112449, 2025, doi: 10.1016/j.jss.2025.112449.
H. A. Alhija, M. Azzeh, and F. Almasalha, “Software Defect Prediction Using Support Vector Machine,” International Journal of Systematic Innovation, vol. 7, no. 2, pp. 37–47, 2022, doi: 10.6977/IJoSI.202206_7(2).0003.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Najla Putri Kartika, Rudy Herteno, Irwan Budiman, Dodon Turianto Nugrahadi, Friska Abadi, Umar Ali Ahmad, Mohammad Reza Faisal

This work is licensed under a Creative Commons Attribution 4.0 International License.





