Comparative Analysis of Machine Learning-Based Software Defect Prediction in Object-Oriented and Structured Paradigms Using Apache Camel and Redis Datasets

Authors

  • Asro Nasiri Faculty of Computer Science, Universitas Amikom Yogyakarta, Indonesia
  • Arief Setyanto Doctor of Informatics, Universitas Amikom Yogyakarta, Indonesia
  • Prof Ema Utami Doctor of Informatics, Universitas Amikom Yogyakarta, Indonesia
  • Kusrini Doctor of Informatics, Universitas Amikom Yogyakarta, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.1.5315

Keywords:

Empirical Software Engineering, Machine Learning, Object-Oriented, Programming Paradigm, Software Defect Prediction, Software Metrics, Structured

Abstract

Software Defect Prediction (SDP) is a crucial component of software engineering aimed at improving quality and testing efficiency. However, the majority of SDP research often overlooks the fundamental influence of the programming paradigm on the nature and causes of defects. This study presents a comparative analysis to identify the most influential software metrics for predicting defects across two distinct paradigms: Object-Oriented (OOP) and Structured. To ensure modern relevance and reproducibility, we constructed two new datasets from large-scale, open-source projects: Apache Camel (Java) for OOP and Redis (C) for Structured which exhibited realistic defect rates of 14.4% and 21.8%, respectively. The dataset creation process involved mining Git repositories for defect labeling and automated metric extraction using the CK and Lizard tools. Correlation analysis and baseline modeling using Random Forest revealed significant differences between the paradigms. In the OOP system, dominant defect predictors were related to the complexity of the class interface and features (e.g., uniqueWordsQty, totalMethodsQty, WMC, CBO). Conversely, defects in the structured system were strongly correlated with size and algorithmic complexity (e.g., file_tokens, file_loc, file_ccn_sum). Although the baseline models performed well (ROC–AUC = 0.82–0.87), the significant class imbalance resulted in low recall (44–50%). This motivates the need for more context aware approaches. These findings underscore that effective SDP strategies must be tailored to the underlying programming paradigm.

Downloads

Download data is not yet available.

References

B. Dhanalaxmi, G. Apparao Naidu, and K. Anuradha, “A Review on Software Fault Detection and Prevention Mechanism in Software Development Activities,” vol. 17, no. 6, pp. 25–30, doi: 10.9790/0661-17652530.

“HERB KRASNER MEMBER, ADVISORY BOARD CONSORTIUM FOR INFORMATION & SOFTWARE QUALITY TM (CISQ TM ) The Cost of Poor Software Quality in the US: A 2020 Report CISQ Consortium for Information & Software Quality I The Cost of Poor Software Quality in the US: A 2020 Report,” 2021.

A. Bahaa Farid, E. Fathy, A. Sharaf Eldin, and L. Abd-Elmegid, “A Systematic Literature Review of Software Defect Prediction Using Deep Learning,” Journal of Computer Science, vol. 17, pp. 490–510, May 2021, doi: 10.3844/jcssp.2021.490.510.

S. Wang et al., “Machine/Deep Learning for Software Engineering: A Systematic Literature Review,” Mar. 01, 2023, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/TSE.2022.3173346.

A. A. P. Ramadhani, R. A. Nugroho, M. R. Faisal, F. Abadi, and R. Herteno, “The impact of software metrics in NASA metric data program dataset modules for software defect prediction,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 22, no. 4, pp. 846–853, Aug. 2024, doi: 10.12928/TELKOMNIKA.v22i4.25787.

B. Khan and A. Nadeem, “Evaluating the effectiveness of decomposed Halstead Metrics in software fault prediction,” PeerJ Comput Sci, vol. 9, 2023, doi: 10.7717/peerj-cs.1647.

T. Siddiqui and M. Mustaqeem, “Performance evaluation of software defect prediction with NASA dataset using machine learning techniques,” International Journal of Information Technology (Singapore), vol. 15, no. 8, pp. 4131–4139, Dec. 2023, doi: 10.1007/s41870-023-01528-9.

R. E. Al-Qutaish and A. Abran, “An Analysis of the Designs and the Definitions of the Halstead’s Metrics.”

K. K. & A. L. S. Kirti Bhandari, “Data Quality Issue on Software Defect Prediction,” Artificial Intelligent Review, vol. 56, pp. 7839–7908, Dec. 2022.

S. Stradowski and L. Madeyski, “Machine learning in software defect prediction: A business-driven systematic mapping study,” Inf Softw Technol, vol. 155, p. 107128, 2023, doi: https://doi.org/10.1016/j.infsof.2022.107128.

R. Hosseini, B. Turhan, and D. Gunarathna, “A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction,” IEEE Transactions on Software Engineering, vol. PP, p. 1, Nov. 2017, doi: 10.1109/TSE.2017.2770124.

E. N. Akimova et al., “A Survey on Software Defect Prediction Using Deep Learning,” Mathematics, vol. 9, no. 11, 2021, doi: 10.3390/math9111180.

C. Hwata, S. Ramasamy, and G. Jekese, “Impact of Object Oriented Design Patterns in Software Development,” Int J Sci Eng Res, Feb. 2015.

R. Verma, K. Kumar, and H. K. Verma, “Code smell prioritization in object-oriented software systems: A systematic literature review,” Journal of Software: Evolution and Process, vol. 35, no. 12, p. e2536, 2023, doi: https://doi.org/10.1002/smr.2536.

F. N. Colakoglu, A. Yazici, and A. Mishra, “Software Product Quality Metrics: A Systematic Mapping Study,” IEEE Access, vol. 9, pp. 44647–44670, 2021, doi: 10.1109/ACCESS.2021.3054730.

H. G. Nunes, A. Santana, E. Figueiredo, and H. Costa, “Tuning Code Smell Prediction Models: A Replication Study,” in Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, in ICPC ’24. New York, NY, USA: Association for Computing Machinery, 2024, pp. 316–327. doi: 10.1145/3643916.3644436.

N. Van Stein, A. V. Kononova, L. Kotthoff, and T. Bäck, “Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms,” in GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference, Association for Computing Machinery, Inc, Jul. 2025, pp. 943–951. doi: 10.1145/3712256.3726328.

T. Yin, “Lizard: An extensible Cyclomatic Complexity Analyzer,” Astrophysics Source Code Library, Jun. 2019.

A. Zapletal, D. Höhler, C. Sinz, and A. Stamatakis, “The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software,” Sci Rep, vol. 11, no. 1, p. 10015, 2021, doi: 10.1038/s41598-021-89495-8.

I. M. Nasir et al., “Pearson correlation-based feature selection for document classification using balanced training,” Sensors (Switzerland), vol. 20, no. 23, pp. 1–18, Dec. 2020, doi: 10.3390/s20236793.

R. Prasetyo, I. Nawawi, A. Fauzi, and G. Ginabila, “Komparasi Algoritma Logistic Regression dan Random Forest pada Prediksi Cacat Software,” Jurnal Teknik Informatika UNIKA Santo Thomas, 2021, [Online]. Available: https://api.semanticscholar.org/CorpusID:250590870

The scikit-learn developers, “scikit-learn,” 2025, Zenodo. doi: 10.5281/zenodo.17084288.

D.-K. Kim and Y. K. Chung, “Addressing Class Imbalances in Software Defect Detection,” Journal of Computer Information Systems, vol. 64, no. 2, pp. 219–231, 2024, doi: 10.1080/08874417.2023.2187483.

C. Arun and C. Lakshmi, “Class Imbalance in Software Fault Prediction Data Set,” Artificial Intelligence and Evolutionary Computations in Engineering Systems, vol. 1056, pp. 745–756, 2020.

Z. Liu, T. Su, M. A. Zakharov, G. Wei, and S. Lee, “Software defect prediction based on residual/shuffle network optimized by upgraded fish migration optimization algorithm,” Sci Rep, vol. 15, no. 1, p. 7201, 2025, doi: 10.1038/s41598-025-91784-5.

M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, “Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning,” PeerJ Comput Sci, vol. 10, 2024, doi: 10.7717/peerj-cs.1860.

R. Suguna, J. Suriya Prakash, H. Aditya Pai, T. R. Mahesh, V. Vinoth Kumar, and T. E. Yimer, “Mitigating class imbalance in churn prediction with ensemble methods and SMOTE,” Sci Rep, vol. 15, no. 1, p. 16256, 2025, doi: 10.1038/s41598-025-01031-0.

E. Mashhadi, S. Chowdhury, S. Modaberi, H. Hemmati, and G. Uddin, “An empirical study on bug severity estimation using source code metrics and static analysis,” Journal of Systems and Software, vol. 217, Nov. 2024, doi: 10.1016/j.jss.2024.112179.

H. Kang and S. Do, “ML-Based Software Defect Prediction in Embedded Software for Telecommunication Systems (Focusing on the Case of SAMSUNG ELECTRONICS),” Electronics (Basel), vol. 13, no. 9, 2024, doi: 10.3390/electronics13091690.

S. Haldar and L. F. Capretz, “Interpretable Software Defect Prediction from Project Effort and Static Code Metrics,” Computers, vol. 13, no. 2, 2024, doi: 10.3390/computers13020052.

A. Ouellet and M. Badri, “Combining object-oriented metrics and centrality measures to predict faults in object-oriented software: An empirical validation,” Journal of Software: Evolution and Process, vol. 36, no. 4, Apr. 2024, doi: 10.1002/smr.2548.

R. Malhotra and J. Jain, “Predicting defects in imbalanced data using resampling methods: An empirical investigation,” PeerJ Comput Sci, vol. 8, 2022, doi: 10.7717/peerj-cs.573.

A. Ampatzoglou, S. Bibi, P. Avgeriou, and A. Chatzigeorgiou, “Guidelines for Managing Threats to Validity of Secondary Studies in Software Engineering,” in Contemporary Empirical Methods in Software Engineering, M. Felderer and G. H. Travassos, Eds., Cham: Springer International Publishing, 2020, pp. 415–441. doi: 10.1007/978-3-030-32489-6_15.

Additional Files

Published

2026-02-15

How to Cite

[1]
A. Nasiri, A. Setyanto, E. Utami, and K. Kusrini, “Comparative Analysis of Machine Learning-Based Software Defect Prediction in Object-Oriented and Structured Paradigms Using Apache Camel and Redis Datasets”, J. Tek. Inform. (JUTIF), vol. 7, no. 1, pp. 529–539, Feb. 2026.