Optimizing Heart Disease Classification Using C4.5, Random Forest, and XGBoost with ANOVA, Chi-Square, and AdaBoost
DOI:
https://doi.org/10.52436/1.jutif.2026.7.2.5430Keywords:
AdaBoost, ANOVA, Chi-Square, Feature Selection, Heart Disease, SMOTEAbstract
Heart disease remains one of the leading causes of mortality worldwide, underscoring the need for accurate and scalable prediction models within clinical informatics. This study proposes a leakage-safe machine learning pipeline combining stratified splitting, SMOTE-based imbalance handling, and in-fold feature selection using ANOVA, Chi-Square, and AdaBoost-assisted ranking to enhance classification performance on a large heart-disease dataset consisting of 10,000 samples and 21 attributes. Three widely used algorithms, C4.5, Random Forest, and XGBoost, were evaluated to determine the optimal model-feature selection configuration for structured medical data. The results demonstrate that feature relevance contributes more significantly to predictive performance than increasing model complexity, with Random Forest achieving the highest accuracy, precision, recall, and F1-Score at 98.43% when combined with Chi-Square or ANOVA feature selection. C4.5 showed the greatest relative improvement, rising from 76.52% to 97.57% using AdaBoost-assisted selection, while XGBoost improved from 66.32% to 94.88% after statistical filtering. The dominant features identified such as CRP, BMI, blood pressure, fasting glucose, LDL, triglycerides, and homocysteine align with well-established cardiovascular biomarkers, supporting clinical validity. This research provides an important contribution to computer science by demonstrating an efficient and scalable hybrid FS-boosting framework capable of reducing unnecessary model complexity, improving generalization, and supporting low-latency deployment in clinical decision-support systems. The findings highlight the potential of structured-data machine learning to strengthen digital health diagnostics in resource-limited environments.
Downloads
References
B. Furst and J. González‐Alonso, “The heart, a secondary organ in the control of blood circulation,” Exp Physiol, vol. 110, no. 5, pp. 649–665, May 2025, doi: 10.1113/EP091387.
R. S. Bhaduaria, I. Javid, and A. Khara, “Advanced Heart Attack Risk Prediction Using Stacked Hybrid Machine Learning,” Journal of Mobile Multimedia, Aug. 2025, doi: 10.13052/jmm1550-4646.21343.
J. P. Rabadia, V. S. Thite, B. K. Desai, R. G. Bera, and S. Patel, “Cardiovascular System, Its Functions and Disorders,” in Cardioprotective Plants, T. Pullaiah and S. Ojha, Eds., Singapore: Springer Nature Singapore, 2024, pp. 1–34. doi: 10.1007/978-981-97-4627-9_1.
E. Moras et al., “Complications in Acute Myocardial Infarction: Navigating Challenges in Diagnosis and Management,” Hearts, vol. 5, no. 1, pp. 122–141, Mar. 2024, doi: 10.3390/hearts5010009.
P. O. Samuel et al., “Lifestyle modifications for preventing and managing cardiovascular diseases,” Sport Sci Health, vol. 20, no. 1, pp. 23–36, Mar. 2024, doi: 10.1007/s11332-023-01118-z.
P. Das, S. Saha, T. Das, P. Das, and T. B. Roy, “Assessing the modifiable and non-modifiable risk factors associated with multimorbidity in reproductive aged women in India,” BMC Public Health, vol. 24, no. 1, p. 676, 2024, doi: 10.1186/s12889-024-18186-6.
W. Lu et al., “Worldwide trends in mortality for hypertensive heart disease from 1990 to 2019 with projection to 2034: data from the Global Burden of Disease 2019 study,” Eur J Prev Cardiol, vol. 31, no. 1, pp. 23–37, Jan. 2024, doi: 10.1093/eurjpc/zwad262.
B. Chong et al., “Global burden of cardiovascular diseases: projections from 2025 to 2050,” Eur J Prev Cardiol, vol. 32, no. 11, pp. 1001–1015, Aug. 2025, doi: 10.1093/eurjpc/zwae281.
F. R. Muharram et al., “The 30 Years of Shifting in The Indonesian Cardiovascular Burden—Analysis of The Global Burden of Disease Study,” J Epidemiol Glob Health, vol. 14, no. 1, pp. 193–212, 2024, doi: 10.1007/s44197-024-00187-8.
P. Pachiyannan, M. Alsulami, D. Alsadie, A. K. J. Saudagar, M. AlKhathami, and R. C. Poonia, “A Novel Machine Learning-Based Prediction Method for Early Detection and Diagnosis of Congenital Heart Disease Using ECG Signal Processing,” Technologies (Basel), vol. 12, no. 1, Jan. 2024, doi: 10.3390/technologies12010004.
K. A. Alnemer, “In-Hospital Mortality in Patients With Acute Myocardial Infarction: A Literature Overview,” Cureus, Aug. 2024, doi: 10.7759/cureus.66729.
A. Nazir et al., “Advancements in Biomarkers for Early Detection and Risk Stratification of Cardiovascular Diseases—A Literature Review,” Health Sci Rep, vol. 8, no. 5, May 2025, doi: 10.1002/hsr2.70878.
A. Sonaglioni, A. Polymeropoulos, M. Baravelli, G. L. Nicolosi, M. Lombardo, and G. Biondi-Zoccai, “Diagnostic Accuracy of Exercise Stress Testing, Stress Echocardiography, Myocardial Scintigraphy, and Cardiac Magnetic Resonance for Obstructive Coronary Artery Disease: Systematic Reviews and Meta-Analyses of 104 Studies Published from 1990 to 2025,” Sep. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/jcm14176238.
M. Rehman, I. Shafi, J. Ahmad, C. O. Garcia, A. E. P. Barrera, and I. Ashraf, “Advancement in medical report generation: current practices, challenges, and future directions,” Med Biol Eng Comput, vol. 63, no. 5, pp. 1249–1270, 2025, doi: 10.1007/s11517-024-03265-y.
D. I. Kasartzian and T. Tsiampalis, “Transforming Cardiovascular Risk Prediction: A Review of Machine Learning and Artificial Intelligence Innovations,” Jan. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/life15010094.
F. Ekundayo and H. Nyavor, “AI-Driven Predictive Analytics in Cardiovascular Diseases: Integrating Big Data and Machine Learning for Early Diagnosis and Risk Prediction,” International Journal of Research Publication and Reviews, vol. 5, no. 12, pp. 1240–1256, Dec. 2024, doi: 10.55248/gengpi.5.1224.3437.
M. Tsai, K. Chen, and P. Chen, “Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review,” J Am Heart Assoc, vol. 14, no. 6, p. e036946, Mar. 2025, doi: 10.1161/JAHA.124.036946.
S. Moazemi et al., “Artificial intelligence for clinical decision support for monitoring patients in cardiovascular ICUs: A systematic review,” Front Med (Lausanne), vol. 10, p. 1109411, Mar. 2023, doi: 10.3389/fmed.2023.1109411.
K. M. Toffaha, M. C. E. Simsekler, A. Alshehhi, and M. A. Omar, “Predicting Hospital No-Shows: Interpretable Machine Learning Models Approach,” IEEE Access, vol. 12, pp. 166058–166067, 2024, doi: 10.1109/ACCESS.2024.3490662.
A. A. Jogdeo, A. D. Patange, A. M. Atnurkar, and P. R. Sonar, “Robustification of the Random Forest: A Multitude of Decision Trees for Fault Diagnosis of Face Milling Cutter Through Measurement of Spindle Vibrations,” Journal of Vibration Engineering & Technologies, vol. 12, no. 3, pp. 4521–4539, 2024, doi: 10.1007/s42417-023-01135-9.
M. M. Gharagoz, M. Noureldin, and J. Kim, “Explainable machine learning (XML) framework for seismic assessment of structures using Extreme Gradient Boosting (XGBoost),” Eng Struct, vol. 327, p. 119621, 2025, doi: https://doi.org/10.1016/j.engstruct.2025.119621.
M. Pal and S. Parija, “Prediction of Heart Diseases using Random Forest,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Mar. 2021. doi: 10.1088/1742-6596/1817/1/012009.
K. Dissanayake and M. G. M. Johar, “Comparative study on heart disease prediction using feature selection techniques on classification algorithms,” Applied Computational Intelligence and Soft Computing, vol. 2021, 2021, doi: 10.1155/2021/5581806.
K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4514–4523, Jul. 2022, doi: 10.1016/j.jksuci.2020.10.013.
A. Abdellatif, H. Abdellatef, J. Kanesan, C.-O. Chow, J. H. Chuah, and H. M. Gheni, “Improving the Heart Disease Detection and Patients’ Survival Using Supervised Infinite Feature Selection and Improved Weighted Random Forest,” IEEE Access, vol. 10, pp. 67363–67372, 2022, doi: 10.1109/ACCESS.2022.3185129.
T. Roopa and G. Dalappagari Ramanjinappa, “Heart Disease Predictive Modeling with XGBoost and SMOTE-Driven Class Imbalance Mitigation,” Technology & Applied Science Research, vol. 15, no. 6, pp. 29914–29918, 2025, doi: 10.48084/etasr.14301.
S. M. Ganie, P. K. D. Pramanik, M. B. Malik, A. Nayyar, and K. S. Kwak, “An Improved Ensemble Learning Approach for Heart Disease Prediction Using Boosting Algorithms,” Computer Systems Science and Engineering, vol. 46, no. 3, pp. 3993–4006, 2023, doi: 10.32604/csse.2023.035244.
X. Xu and X. Zhou, “Deep Learning Based Feature Selection and Ensemble Learning for Sintering State Recognition,” Sensors, vol. 23, no. 22, p. 9217, Nov. 2023, doi: 10.3390/s23229217.
M. Alkhodari, D. K. Islayem, F. A. Alskafi, and A. H. Khandoker, “Predicting hypertensive patients with higher risk of developing vascular events using heart rate variability and machine learning,” IEEE Access, vol. 8, pp. 192727–192739, 2020, doi: 10.1109/ACCESS.2020.3033004.
W. K. Cheng, I. M. Khairuddin, A. P.P. Abdul Majeed, and M. A. Mohd Razman, “The Classification of Heart Murmurs: The Identification of Significant Time Domain Features,” MEKATRONIKA, vol. 2, no. 2, pp. 36–43, Dec. 2020, doi: 10.15282/mekatronika.v2i2.6748.
Y.-Y. Wang, B. Liu, and J.-H. Wang, “Application of deep learning-based convolutional neural networks in gastrointestinal disease endoscopic examination,” World J Gastroenterol, vol. 31, no. 36, Sep. 2025, doi: 10.3748/wjg.v31.i36.111137.
A. Amato and V. Di Lecce, “Data preprocessing impact on machine learning algorithm performance,” Open Computer Science, vol. 13, no. 1, p. 20220278, Jul. 2023, doi: 10.1515/comp-2022-0278.
X. Ding, H. Wang, G. Li, H. Li, Y. Li, and Y. Liu, “IoT data cleaning techniques: A survey,” Intelligent and Converged Networks, vol. 3, no. 4, pp. 325–339, Dec. 2022, doi: 10.23919/ICN.2022.0026.
V. V. R. Karna, V. R. Karna, V. Janamala, V. N. K. R. Devana, V. R. S. Ch, and A. B. Tummala, “A Comprehensive Review on Heart Disease Risk Prediction using Machine Learning and Deep Learning Algorithms,” Archives of Computational Methods in Engineering, vol. 32, no. 3, pp. 1763–1795, 2025, doi: 10.1007/s11831-024-10194-4.
A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE,” Procedia Comput Sci, vol. 234, pp. 578–583, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.042.
P. Dhal and C. Azad, “A fine-tuning deep learning with multi-objective-based feature selection approach for the classification of text,” Neural Comput Appl, vol. 36, no. 7, pp. 3525–3553, 2024, doi: 10.1007/s00521-023-09225-1.
M. Büyükkeçeci and M. C. Okur, “An Empirical Evaluation of Feature Selection Stability and Classification Accuracy,” Gazi University Journal of Science, vol. 37, no. 2, pp. 606–620, 2024, doi: 10.35378/gujs.998964.
A. S. Hada, G. S. Sahoo, C. K. Vamsi, A. Hegde, and B. Bhowmik, “Optimizing Feature Selection in Big Data: A Hybrid Spark and Fuzzy Approach,” in 2024 International Conference on Computing, Semiconductor, Mechatronics, Intelligent Systems and Communications (COSMIC), 2024, pp. 195–199. doi: 10.1109/COSMIC63293.2024.10871408.
C. Wu et al., “SEMG Measurement Position and Feature Optimization Strategy for Gesture Recognition Based on ANOVA and Neural Networks,” IEEE Access, vol. 8, pp. 56290–56299, 2020, doi: 10.1109/ACCESS.2020.2982405.
T. N. Annisa, J. Jasmir, and N. Nurhadi, “Comparison of ANOVA and Chi-Square Feature Selection Methods to Improve Machine Learning Performance in Anemia Classification,” Jurnal Teknik Informatika (JUTIF), vol. 6, no. 4, pp. 2723–3863, 2025, doi: 10.52436/1.jutif.2025.6.4.5017.
S. S. Hussain and S. S. H. Zaidi, “AdaBoost Ensemble Approach with Weak Classifiers for Gear Fault Diagnosis and Prognosis in DC Motors,” Applied Sciences (Switzerland), vol. 14, no. 7, Apr. 2024, doi: 10.3390/app14073105.
A. Shebl and Á. Csámer, “Machine Learning Algorithms for Gold-Bearing Alteration Mapping in the Egyptian Nubian Shield Utilizing Remote Sensing Datasets,” in Ore Geology Reviews, vol. 161, Elsevier, 2025, pp. 459–480. doi: 10.1007/978-3-031-75972-7_17.
H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/BJML/2024/007.
I. Markoulidakis and G. Markoulidakis, “Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis,” Technologies (Basel), vol. 12, no. 7, Jul. 2024, doi: 10.3390/technologies12070113.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Andika Pratama, Setiawan Assegaff, Jasmir Jasmir, Nurhadi Nurhadi

This work is licensed under a Creative Commons Attribution 4.0 International License.





