Systematic Optimization of Ensemble Learning for Heart Failure Survival Prediction using SHAP and Optuna

Authors

  • Bayu Setia Informatics, Universitas Teknologi Yogyakarta, Indonesia
  • Umar Zaky Information Systems, Universitas Teknologi Yogyakarta, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.5324

Keywords:

Ensemble Learning, Heart Failure, Optuna, SHAP, SMOTE

Abstract

Heart failure (HF) stands as a major global health problem where precise and early prediction of patient prognosis is essential for improving clinical management and patient care. A common obstacle for standard machine learning models in this domain is the prevalent issue of class imbalance within clinical datasets. To overcome this challenge, this study introduces a systematically optimized ensemble learning model for the accurate classification of patient survival. The methodology was applied to a publicly accessible clinical dataset of 299 heart failure patients. Its comprehensive framework included logarithmic transformation, stratified data splitting (80:20), SHAP-based selection of eight key features, and hyperparameter tuning with Optuna over 75 trials, with the specific objective of maximizing the F1-score using 10-fold cross-validation. The performance of three ensemble models (Random Forest, XGBoost, and LightGBM) was refined using decision threshold tuning. The results revealed that the fully optimized Random Forest model yielded superior outcomes, attaining an accuracy of 96.67%, an F1-score of 0.9474, and precision and recall values of 0.95, demonstrating high reliability with only a single instance of a False Negative and False Positive. The study concludes that the systematic application of SHAP, SMOTE, and Optuna within an ensemble framework substantially improves classification performance for imbalanced HF data, surpassing existing benchmarks. This work thus provides a replicable and systematic framework for developing reliable machine learning models from complex, imbalanced medical datasets, contributing a valuable methodology to the field of computational science.

Downloads

Download data is not yet available.

References

S. König et al., “From population-to patient-based prediction of in-hospital mortality in heart failure using machine learning,” European Heart Journal - Digital Health, vol. 3, no. 2, pp. 307–310, Jun. 2022, doi: 10.1093/ehjdh/ztac012.

C. Zheng et al., “Time-to-event prediction analysis of patients with chronic heart failure comorbid with atrial fibrillation: a LightGBM model,” BMC Cardiovascular Disorders, vol. 21, no. 1, p. 379, 2021, doi: 10.1186/s12872-021-02188-y.

X. Li, C. Shang, C. Xu, Y. Wang, J. Xu, and Q. Zhou, “Development and comparison of machine learning-based models for predicting heart failure after acute myocardial infarction,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, p. 165, 2023, doi: 10.1186/s12911-023-02240-1.

Q. Lin et al., “Predicting the risk of heart failure after acute myocardial infarction using an interpretable machine learning model,” Frontiers in Cardiovascular Medicine, vol. 12, 2025, doi: 10.3389/fcvm.2025.1444323.

B. Zheng et al., “Prediction of 90 day readmission in heart failure with preserved ejection fraction by interpretable machine learning,” ESC Heart Failure, vol. 11, no. 6, pp. 4267–4276, Dec. 2024, doi: 10.1002/ehf2.15033.

X. Hou et al., “Prediction of Acute Kidney Injury Following Isolated Coronary Artery Bypass Grafting in Heart Failure Patients with Preserved Ejection Fraction Using Machine Leaning with a Novel Nomogram,” Reviews in Cardiovascular Medicine, vol. 25, no. 2, p. 43, doi: 10.31083/j.rcm2502043.

F. Li, H. Xin, J. Zhang, M. Fu, J. Zhou, and Z. Lian, “Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database,” BMJ Open, vol. 11, no. 7, p. e044779, Jul. 2021, doi: 10.1136/bmjopen-2020-044779.

N. Cauwenberghs, F. Sabovčik, A. Magnus, F. Haddad, and T. Kuznetsova, “Proteomic profiling for detection of early-stage heart failure in the community,” ESC Heart Failure, vol. 8, no. 4, pp. 2928–2939, Aug. 2021, doi: 10.1002/ehf2.13375.

Q. Wang et al., “Machine learning-based risk prediction of malignant arrhythmia in hospitalized patients with heart failure,” ESC Heart Failure, vol. 8, no. 6, pp. 5363–5371, Dec. 2021, doi: 10.1002/ehf2.13627.

K. Wang et al., “Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP,” Computers in Biology and Medicine, vol. 137, p. 104813, 2021, doi: 10.1016/j.compbiomed.2021.104813.

A. Newaz, N. Ahmed, and F. Shahriyar Haq, “Survival prediction of heart failure patients using machine learning techniques,” Informatics in Medicine Unlocked, vol. 26, p. 100772, 2021, doi: 10.1016/j.imu.2021.100772.

T. A. Assegie, V. Elanangai, J. S. Paulraj, M. Velmurugan, and D. F. Devesan, “Evaluation of feature scaling for improving the performance of supervised learning methods,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 3, pp. 1833–1838, Jun. 2023, doi: 10.11591/eei.v12i3.5170.

S. Sutikno, “Combination of Binary Particle Swarm Optimization (BPSO) and Multilayer Perceptron (MLP) for Survival Prediction of Heart Failure Patients”, INFOTEL, vol. 16, no. 1, pp. 96-104, Feb. 2024, doi: 10.20895/infotel.v16i1.974

A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.

Y. Ramdhani, C. M. Putra, and D. P. Alamsyah, “Heart failure prediction based on random forest algorithm using genetic algorithm for feature selection,” International Journal of Reconfigurable and Embedded Systems, vol. 12, no. 2, pp. 205–214, Jul. 2023, doi: 10.11591/ijres.v12.i2.pp205-214.

N. Tasnim, S. Al Mamun, M. Shahidul Islam, M. S. Kaiser, and M. Mahmud, “Explainable Mortality Prediction Model for Congestive Heart Failure with Nature-Based Feature Selection Method,” Applied Sciences, vol. 13, no. 10, p. 6138, 2023, doi: 10.3390/app13106138.

Z. Chen, T. Li, S. Guo, D. Zeng, and K. Wang, “Machine learning-based in-hospital mortality risk prediction tool for intensive care unit patients with heart failure,” Frontiers in Cardiovascular Medicine, vol. 10, 2023, doi: 10.3389/fcvm.2023.1119699.

M. Tanaka et al., “Development of interpretable machine learning models to predict in-hospital prognosis of acute heart failure patients,” ESC Heart Failure, vol. 11, no. 5, pp. 2798–2812, Oct. 2024, doi: 10.1002/ehf2.14834.

D. Tu, Q. Xu, Y. Luan, J. Sun, X. Zuo, and C. Ma, “Integrative analysis of bioinformatics and machine learning to identify cuprotosis-related biomarkers and immunological characteristics in heart failure,” Frontiers in Cardiovascular Medicine, vol. 11, 2024, doi: 10.3389/fcvm.2024.1349363.

"Heart Failure Clinical Records," UCI Machine Learning Repository, 2020. doi: 10.24432/C5Z89R.

D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Medical Informatics and Decision Making, vol. 20, no. 1, p. 16, 2020, doi: 10.1186/s12911-020-1023-5.

P. Rahman, A. Rifat, M. IftehadAmjad Chy, M. Monirujjaman Khan, M. Masud, and S. Aljahdali, “Machine Learning and Artificial Neural Network for Predicting Heart Failure Risk,” Computer Systems Science and Engineering, vol. 44, no. 1, pp. 757–775, 2023. doi: 10.32604/csse.2023.021469

K. Yongcharoenchaiyasit, S. Arwatchananukul, P. Temdee, and R. Prasad, “Gradient Boosting Based Model for Elderly Heart Failure, Aortic Stenosis, and Dementia Classification,” IEEE Access, vol. 11, pp. 48677–48696, 2023, doi: 10.1109/ACCESS.2023.3276468.

C.-Y. Guo, M.-Y. Wu, and H.-M. Cheng, “The Comprehensive Machine Learning Analytics for Heart Failure,” International Journal of Environmental Research and Public Health, vol. 18, no. 9, p. 4943, 2021, doi: 10.3390/ijerph18094943.

J. Tian et al., “Machine learning prognosis model based on patient-reported outcomes for chronic heart failure patients after discharge,” Health and Quality of Life Outcomes, vol. 21, no. 1, p. 31, 2023, doi: 10.1186/s12955-023-02109-x.

H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, “Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods,” Journal of Big Data, vol. 11, no. 1, p. 44, 2024, doi: 10.1186/s40537-024-00905-w.

Y. Wang et al., “Clinical Prediction of Heart Failure in Hemodialysis Patients: Based on the Extreme Gradient Boosting Method,” Frontiers in Genetics, vol. 13, 2022, doi: 10.3389/fgene.2022.889378.

K. Wang et al., “Improving risk identification of adverse outcomes in chronic heart failure using smote +enn and machine learning,” Risk Management and Healthcare Policy, vol. 14, pp. 2453–2463, 2021, doi: 10.2147/RMHP.S310295.

J. Li, S. Liu, Y. Hu, L. Zhu, Y. Mao, and J. Liu, “Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study,” Journal of Medical Internet Research, vol. 24, no. 8, p. e38082, 2022, doi: 10.2196/38082.

L. T. Ravulapalli, R. K. Paladugu, V. K. Rao Likki, R. Mothukuri, N. Mukkapati, and S. Kilaru, “Evaluative Study of Machine Learning Classifiers in Predicting Heart Failure: A Focus on Imbalanced Datasets,” Ingenierie des Systemes d’Information, vol. 28, no. 3, pp. 717–724, Jun. 2023, doi: 10.18280/isi.280322.

J. I. E. Yang, J. Yan, Z. Pei, A. Hu, and Y. Zhang, “Prediction Model for In-Hospital Mortality of Patients with Heart Failure Based on Optuna and Light Gradient Boosting Machine,” Journal of Mechanics in Medicine and Biology, vol. 22, no. 09, p. 2240059, Sep. 2022, doi: 10.1142/S0219519422400590.

P. K. Sahu and T. Fatma, “Optimized Breast Cancer Classification Using PCA-LASSO Feature Selection and Ensemble Learning Strategies With Optuna Optimization,” IEEE Access, vol. 13, pp. 35645–35661, 2025, doi: 10.1109/ACCESS.2025.3539746.

G. Riski, D. Hartama, and Solikhun, “Optimizing Multilayer Perceptron for Car Purchase Prediction with GridSearch and Optuna,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 9, no. 2, pp. 266–275, Apr. 2025, doi: 10.29207/resti.v9i2.6328.

R. D. a. Abdu-Aljabar and O. A. Awad, “Improving Lung Cancer Relapse Prediction Using the Developed Optuna_XGB Classification Model,” International Journal of Intelligent Engineering and Systems, vol. 16, no. 1, pp. 131–141, 2023, doi: 10.22266/ijies2023.0228.12.

Q. A. Hidayaturrohman and E. Hanada, “A Comparative Analysis of Hyper-Parameter Optimization Methods for Predicting Heart Failure Outcomes,” Applied Sciences, vol. 15, no. 6, p. 3393, 2025, doi: 10.3390/app15063393.

P. Chen, J. Sun, Y. Chu, and Y. Zhao, “Predicting in-hospital mortality in patients with heart failure combined with atrial fibrillation using stacking ensemble model: an analysis of the medical information mart for intensive care IV (MIMIC-IV),” BMC Medical Informatics and Decision Making, vol. 24, no. 1, p. 402, 2024, doi: 10.1186/s12911-024-02829-0.

S. M. Al Younis et al., “Investigating automated regression models for estimating left ventricular ejection fraction levels in heart failure patients using circadian ECG features,” PLoS ONE, vol. 18, no. 12, p. e0295653, Dec. 2023, doi: 10.1371/journal.pone.0295653

A. A. Almazroi, “Survival prediction among heart patients using machine learning techniques,” Mathematical Biosciences and Engineering, vol. 19, no. 1, pp. 134–145, 2022, doi: 10.3934/mbe.2022007.

S. M. Dalhatu and M. A. A. Murad, “A model for enhancing pattern recognition in clinical narrative datasets through text-based feature selection and SHAP technique,” International Journal on Informatics Visualization (JOIV), vol. 8, no. 4, pp. 2287–2296, Dec. 2024, doi: 10.62527/joiv.8.4.3664.

Additional Files

Published

2025-10-31

How to Cite

[1]
B. . Setia and U. . Zaky, “Systematic Optimization of Ensemble Learning for Heart Failure Survival Prediction using SHAP and Optuna”, J. Tek. Inform. (JUTIF), vol. 6, no. 5, pp. 5320–5332, Oct. 2025.