Cardiovascular Disease Risk Prediction Using Random Forest, RFECV Feature Selection, and SHAP with Multisource Clinical Data Integration
DOI:
https://doi.org/10.52436/1.jutif.2026.7.1.5744Keywords:
Cardiovascular Disease, Random Forest, SHAP, Data Integration, Risk Prediction SystemAbstract
Cardiovascular disease (CVD) remains one of the leading causes of mortality in Indonesia, highlighting the urgent need for effective preventive strategies, including the development of risk prediction systems based on population health data. A major challenge in developing CVD prediction models is the limited availability of local medical data that adequately represent the Indonesian population. This study aims to develop a CVD risk prediction model using the Random Forest algorithm by integrating two data sources: private clinical data from cardiology outpatients at RSUD M. Yunus Bengkulu and a publicly available dataset. Data integration was conducted to address the limited size of private data and to improve model performance. The research was conducted through three experimental settings. Shapley Additive Explanations (SHAP) were employed to analyze the contribution of each feature, while Recursive Feature Elimination with Cross-Validation (RFECV) was applied for feature selection. The results indicate that Scenario 3 in the Experiment on Data Integration achieved the best performance, with an accuracy of 73.57%, recall of 81.44%, and F1-score of 77.06%. SHAP analysis identified blood pressure and age as the most influential predictors of CVD risk. These findings demonstrate that integrating limited private data with public datasets can significantly improve model performance while providing clinically interpretable insights, particularly in settings with constrained local data availability.
Downloads
References
A. Alqahtani, S. Alsubai, M. Sha, L. Vilcekova, and T. Javed, “Cardiovascular Disease Detection using Ensemble Learning,” Comput. Intell. Neurosci., vol. 2022, pp. 1–9, Aug. 2022, doi: 10.1155/2022/5267498.
L. Ciumărnean et al., “Cardiovascular risk factors and physical activity for the prevention of cardiovascular diseases in the elderly,” Int. J. Environ. Res. Public Health, vol. 19, no. 1, pp. 207–223, Jan. 2022, doi: 10.3390/ijerph19010207.
E. Dritsas, S. Alexiou, and K. Moustakas, “Cardiovascular Disease Risk Prediction with Supervised Machine Learning Techniques,” in International Conference on Information and Communication Technologies for Ageing Well and e-Health, ICT4AWE - Proceedings, Science and Technology Publications, Lda, 2022, pp. 315–321. doi: 10.5220/0011088300003188.
L. A. Kaminsky, C. German, M. Imboden, C. Ozemek, J. E. Peterman, and P. H. Brubaker, “The importance of healthy lifestyle behaviors in the prevention of cardiovascular disease,” Prog. Cardiovasc. Dis., vol. 70, pp. 8–15, Jan. 2022, doi: 10.1016/j.pcad.2021.12.001.
Staff of WHO, “Cardiovascular diseases (CVDs),” https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).
Badan Kebijakan Pembangunan Kesehatan (BKPK) Kemenkes, “Survei Kesehatan Indonesia (SKI) 2023,” 2023.
D. Fania, I. Waspada, and H. A. Wibawa, “Addressing Data Limitations in Cardiovascular Disease Prediction: Integration of Public Databases and Clinical Records,” in International Conference on Informatics and Computational Sciences (ICICoS), Institute of Electrical and Electronics Engineers (IEEE), Jan. 2026, pp. 293–298. doi: 10.1109/icicos68590.2025.11329869.
S. Dalal et al., “Application of Machine Learning for Cardiovascular Disease Risk Prediction,” Comput. Intell. Neurosci., vol. 2023, no. 1, Jan. 2023, doi: 10.1155/2023/9418666.
A. H. Elmi, A. Abdullahi, and M. A. Barre, “A machine learning approach to cardiovascular disease prediction with advanced feature selection,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 33, no. 2, pp. 1030–1041, Feb. 2024, doi: 10.11591/ijeecs.v33.i2.pp1030-1041.
J. Azmi, M. Arif, M. T. Nafis, M. A. Alam, S. Tanweer, and G. Wang, “A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data,” Med. Eng. Phys., vol. 105, Jul. 2022, doi: 10.1016/j.medengphy.2022.103825.
S. D. Reddy, S. Lohitha, and F. Shaik, “Machine Learning based Mobile App for Heart Disease Prediction,” in International Conference on Innovative Data Communication Technologies and Application, ICIDCA 2023 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 464–470. doi: 10.1109/ICIDCA56705.2023.10099714.
M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthcare Analytics, vol. 3, Nov. 2023, doi: 10.1016/j.health.2022.100130.
H. N. Huang et al., “Employing feature engineering strategies to improve the performance of machine learning algorithms on echocardiogram dataset,” Digit. Health, vol. 9, Jan. 2023, doi: 10.1177/20552076231207589.
C. Dadiyala, A. A. Saxena, K. A. Kale, K. A. Bhattad, N. T. S. Sheikh, and Priyanshi, “Progressive Heart Disease Prediction Model Using Machine Learning: A Comprehensive Staging Approach,” in International Conference on Smart Systems for Applications in Electrical Sciences, ICSSES 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/ICSSES62373.2024.10561373.
G. Kumar Sahoo, K. Kanike, S. K. Das, and P. Singh, “Machine Learning-Based Heart Disease Prediction: A Study for Home Personalized Care,” in IEEE International Workshop on Machine Learning for Signal Processing, MLSP, IEEE Computer Society, Nov. 2022. doi: 10.1109/MLSP55214.2022.9943373.
N. Biswas et al., “Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques,” Biomed Res. Int., vol. 2023, 2023, doi: 10.1155/2023/6864343.
K. Sumwiza, C. Twizere, G. Rushingabigwi, P. Bakunzibake, and P. Bamurigire, “Enhanced cardiovascular disease prediction model using random forest algorithm,” Inform. Med. Unlocked, vol. 41, Jan. 2023, doi: 10.1016/j.imu.2023.101316.
Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, and X. Liang, “An improved random forest based on the classification accuracy and correlation measurement of decision trees,” Expert Syst. Appl., vol. 237, Mar. 2024, doi: 10.1016/j.eswa.2023.121549.
Y. Yang and H. Wang, “Random Forest-Based Machine Failure Prediction: A Performance Comparison,” Applied Sciences (Switzerland), vol. 15, no. 16, Aug. 2025, doi: 10.3390/app15168841.
M. Pal and S. Parija, “Prediction of Heart Diseases using Random Forest,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Mar. 2021. doi: 10.1088/1742-6596/1817/1/012009.
V. Pandey, U. K. Lilhore, and R. Walia, “A systematic review on cardiovascular disease detection and classification,” Biomed. Signal Process. Control, vol. 102, Apr. 2025, doi: 10.1016/j.bspc.2024.107329.
K. M. Zobair et al., “Systematic review of Internet of medical things for cardiovascular disease prevention among Australian first nations,” Nov. 01, 2023, Elsevier Ltd. doi: 10.1016/j.heliyon.2023.e22420.
M. D. Christina Magnussen, Ph. D. , Francisco M. Ojeda, M. B. , B. S. , Ph. D. Darryl P. Leong, M. D. Jesus Alegre-Diaz, M. D. , Ph. D. , Philippe Amouyel, and etc, “Global Effect of Modifiable Risk Factors on Cardiovascular Disease and Mortality,” New England Journal of Medicine, vol. 389, no. 14, pp. 1273–1285, Oct. 2023, doi: 10.1056/NEJMoa2206916.
M. Balakrishnan, A. B. Arockia Christopher, P. Ramprakash, and A. Logeswari, “Prediction of Cardiovascular Disease using Machine Learning,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Feb. 2021. doi: 10.1088/1742-6596/1767/1/012013.
S. DEMİR and E. K. ŞAHİN, “Assessment of Feature Selection for Liquefaction Prediction Based on Recursive Feature Elimination,” European Journal of Science and Technology, Sep. 2021, doi: 10.31590/ejosat.998033.
C. Y. Freytes et al., “Recursive Feature Elimination with Cross Validation for Alzheimer’s Disease Classification using Cognitive Exam Scores,” in 1st International Conference of Intelligent Methods, Systems and Applications, IMSA 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 327–332. doi: 10.1109/IMSA58542.2023.10217660.
M. Awad and S. Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems,” Journal of Sensor and Actuator Networks, vol. 12, no. 5, Oct. 2023, doi: 10.3390/jsan12050067.
E. Miranda, S. Adiarto, F. M. Bhatti, A. Y. Zakiyyah, M. Aryuni, and C. Bernando, “Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach,” Healthc. Inform. Res., vol. 29, no. 3, pp. 228–238, Jul. 2023, doi: 10.4258/hir.2023.29.3.228.
V. Vimbi, N. Shaffi, and M. Mahmud, “Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection,” Dec. 01, 2024, Springer Science and Business Media Deutschland GmbH. doi: 10.1186/s40708-024-00222-1.
X. Tusongtuoheti, Y. Shu, G. Huang, and Y. Mao, “Predicting the risk of subclinical atherosclerosis based on interpretable machine models in a Chinese T2DM population,” Front. Endocrinol. (Lausanne)., vol. 15, 2024, doi: 10.3389/fendo.2024.1332982.
M. Ibrahim, “Evolution of Random Forest from Decision Tree and Bagging: A Bias-Variance Perspective,” Dhaka University Journal of Applied Science and Engineering, vol. 7, no. 1, pp. 66–71, Feb. 2023, doi: 10.3329/dujase.v7i1.62888.
Svetlana Ulianova, “Kaggle Dataset, ‘Cardiovascular Disease dataset’.,” https://www.kaggle.com/sulianova/cardiovascular-disease-dataset.
Y. Zhang and Z. Wang, “Feature Engineering and Model Optimization Based Classification Method for Network Intrusion Detection,” Aug. 01, 2023, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/app13169363.
N. I. Fardana, R. R. Isnanto, and O. D. Nurhayati, “Handling Class Imbalance in Health Datasets: A Comparative Study of SMOTE and SMOTEENN with TabNet,” in 2025 8th International Conference on Informatics and Computational Sciences (ICICoS), Semarang: Institute of Electrical and Electronics Engineers (IEEE), Jan. 2026, pp. 305–310. doi: 10.1109/icicos68590.2025.11329876.
N. I. Fardana, R. R. Isnanto, and O. D. Nurhayati, “Pneumothorax Detection System in Thoracic Radiography Images Using CNN Method,” Scientific Journal of Informatics, vol. 11, no. 4, pp. 981–990, Jan. 2025, doi: 10.15294/sji.v11i4.16635.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Dea Fania, Indra Waspada, Helmie Arif Wibawa

This work is licensed under a Creative Commons Attribution 4.0 International License.





