Stroke Risk Prediction using Winsorizing Interquartile Range and Tree-Based Classification with Explainable Artificial Intelligence
DOI:
https://doi.org/10.52436/1.jutif.2025.6.6.4760Keywords:
eXplainable Artificial Intelligence, Machine Learning, Stroke, Tree-based Method, Winsorizing IQRAbstract
According to the Global Burden of Disease (GBD) Study, stroke is the third leading cause of death globally. Recognizing its signs early is crucial for both prevention and effective treatment. Although machine learning has made significant progress in predicting strokes, many current models operate like "black boxes", making them hard to interpret and often resulting in high error rates. This study aims to enhance prediction accuracy and interpretability in stroke risk detection by integrating Winsorizing Interquartile Range (IQR) for outlier management, a tree-based classification method, and Explainable Artificial Intelligence (XAI) techniques. The proposed approach applies Winsorizing Interquartile Range to handle extreme values while employing tree-based methods for prediction due to their superior performance in processing tabular data. Additionally, Explainable Artificial Intelligence techniques are utilized to improve model transparency and interpretability. Testing was conducted using the Cerebral Stroke Prediction-Imbalanced Dataset, comparing results with various existing models. The suggested approach demonstrated the lowest prediction error rates, achieving a False Positive Rate (FPR) of 15.74% and a False Negative Rate (FNR) of 8.56%. Additionally, it attained an accuracy of 84.39%, sensitivity of 91.43%, specificity of 84.26%, Area Under the Receiver Operating Characteristic Curve (AUROC) of 94.74%, and G-Mean of 87.76%, outperforming previous studies in stroke risk prediction. The combination of Winsorizing Interquartile Range, Random Under-Sampling, tree-based classification, and Explainable Artificial Intelligence techniques effectively enhances prediction accuracy and transparency, supporting early stroke detection with improved interpretability. This study contributes to medical informatics by integrating transparent predictive models suitable for decision support systems.
Downloads
References
I. Surakka et al., “Multi-ancestry meta-analysis identifies 5 novel loci for ischemic stroke and reveals heterogeneity of effects between sexes and ancestries,” Cell Genomics, vol. 3, no. 8, Aug. 2023, doi: 10.1016/j.xgen.2023.100345.
B. Hum et al., “Unveiling the evolving landscape of stroke care costs: A time-driven analysis,” Journal of Stroke and Cerebrovascular Diseases, vol. 33, no. 6, p. 107663, 2024, doi: 10.1016/j.jstrokecerebrovasdis.2024.107663.
V. L. Feigin et al., “Global , regional , and national burden of stroke and its risk factors , 1990 – 2021 : a systematic analysis for the Global Burden of Disease Study 2021,” vol. 23, no. 10, pp. 973–1003, 2024, doi: 10.1016/s1474-4422(24)00369-7.
V. L. Feigin et al., “World Stroke Organization (WSO): Global Stroke Fact Sheet 2022,” International Journal of Stroke, vol. 17, no. 1, pp. 18–29, 2022, doi: 10.1177/17474930211065917.
T. Vu et al., “Machine Learning Approaches for Stroke Risk Prediction: Findings from the Suita Study,” Journal of Cardiovascular Development and Disease, vol. 11, no. 7, 2024, doi: 10.3390/jcdd11070207.
A. K. Boehme, C. Esenwa, and M. S. V. Elkind, “Stroke Risk Factors, Genetics, and Prevention,” Physiology & behavior, vol. 176, no. 1, pp. 100–106, 2017, doi: 10.1177/0022146515594631.Marriage.
M. L. De Mélo Silva Júnior, N. C. D. S. Menezes, and M. V. D. S. Vilanova, “Recognition, reaction, risk factors and adequate knowledge of stroke: A Brazilian populational survey,” Journal of Stroke and Cerebrovascular Diseases, vol. 32, no. 8, p. 107228, 2023, doi: 10.1016/j.jstrokecerebrovasdis.2023.107228.
J. Attakorah, K. B. Mensah, P. Yamoah, V. Bangalee, and F. Oosthuizen, “Awareness of stroke, its signs, and risk factors: A cross-sectional population-based survey in Ghana,” Health Science Reports, vol. 7, no. 6, 2024, doi: 10.1002/hsr2.2179.
T. Zuo, F. Li, X. Zhang, F. Hu, L. Huang, and W. Jia, “Stroke classification based on deep reinforcement learning over stroke screening imbalanced data,” Computers and Electrical Engineering, vol. 114, Mar. 2024, doi: 10.1016/j.compeleceng.2023.109069.
C. Fernandez-Lozano et al., “Random forest-based prediction of stroke outcome,” Scientific Reports, vol. 11, no. 1, pp. 1–12, 2021, doi: 10.1038/s41598-021-89434-7.
T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and Prediction Using Robust Learning Approaches,” Journal of Healthcare Engineering, vol. 2021, 2021, doi: 10.1155/2021/7633381.
X. Zhu et al., “Effect of the number of unhealthy lifestyles in middle-aged and elderly people on hypertension and the first occurrence of ischemic stroke after the disease,” Frontiers in Cardiovascular Medicine, vol. 10, no. May, pp. 1–9, 2023, doi: 10.3389/fcvm.2023.1152423.
S. Yalçın and H. Vural, “Brain stroke classification and segmentation using encoder-decoder based deep convolutional neural networks,” Computers in Biology and Medicine, vol. 149, Oct. 2022, doi: 10.1016/j.compbiomed.2022.105941.
T. Liu, W. Fan, and C. Wu, “A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset,” Artificial Intelligence in Medicine, vol. 101, no. August, p. 101723, 2019, doi: 10.1016/j.artmed.2019.101723.
D. H. Shih, Y. H. Wu, T. W. Wu, H. Y. Chu, and M. H. Shih, “Stroke Prediction Using Deep Learning and Transfer Learning Approaches,” IEEE Access, vol. 12, no. June, pp. 130091–130104, 2024, doi: 10.1109/ACCESS.2024.3429157.
S. Ali et al., “Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence,” Information Fusion, vol. 99, Nov. 2023, doi: 10.1016/j.inffus.2023.101805.
C. Kokkotis et al., “An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data,” Diagnostics, pp. 153–157, 2022, doi: 10.3390/diagnostics12102392.
J. W. Osborne and A. Overbay, “The power of outliers (and why researchers should ALWAYS check for them),” Practical Assessment, Research and Evaluation, vol. 9, no. 6, 2004, doi: 10.7275/qf69-7k43.
E. Panjei, L. Gruenwald, E. Leal, C. Nguyen, and S. Silvia, “A survey on outlier explanations,” The VLDB Journal, vol. 31, no. 5, pp. 977–1008, 2022, doi: 10.1007/s00778-021-00721-1.
H. P. Vinutha, B. Poornima, and B. M. Sagar, “Detection of outliers using interquartile range technique from intrusion dataset,” Advances in Intelligent Systems and Computing, vol. 701, pp. 511–518, 2018, doi: 10.1007/978-981-10-7563-6_53.
M. T. Hosain, J. R. Jim, M. F. Mridha, and M. M. Kabir, “Explainable AI approaches in deep learning: Advancements, applications and challenges,” Computers and Electrical Engineering, vol. 117, Jul. 2024, doi: 10.1016/j.compeleceng.2024.109246.
C. S. K. Dash, A. K. Behera, S. Dehuri, and A. Ghosh, “An outliers detection and elimination framework in classification task of data mining,” Decision Analytics Journal, vol. 6, no. May 2022, 2023, doi: 10.1016/j.dajour.2023.100164.
L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why do tree-based models still outperform deep learning on typical tabular data?,” Advances in Neural Information Processing Systems, vol. 35, no. NeurIPS, pp. 507–520, 2022, doi: 10.48550/arXiv.2207.08815.
S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems 30, 2017. doi: 10.48550/arXiv.1705.07874.
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,” NAACL-HLT 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session, pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.
G. Szepannek and K. Lübke, “How much do we see? On the explainability of partial dependence plots for credit risk scoring,” Argumenta Oeconomica, vol. 2023, no. 1, pp. 137–150, 2023, doi: 10.15611/aoe.2023.1.07.
T. Liu, W. Fan, and C. Wu, “Data for: A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets,” Mendeley Data, 2019, doi: 10.17632/x8ygrw87jw.1.
Z. Sun, W. Ying, W. Zhang, and S. Gong, “Undersampling method based on minority class density for imbalanced data,” Expert Systems with Applications, vol. 249, no. February, 2024, doi: 10.1016/j.eswa.2024.123328.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, pp. 321–357, 2002, doi: 10.1613/jair.953.
H. Sain and S. W. Purnami, “Combine Sampling Support Vector Machine for Imbalanced Data Classification,” in Procedia Computer Science, Elsevier, 2015, pp. 59–66. doi: 10.1016/j.procs.2015.12.105.
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009, doi: 10.1109/TKDE.2008.239.
L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
F. R. Aszhari, Z. Rustam, F. Subroto, and A. S. Semendawai, “Classification of thalassemia data using random forest algorithm,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jun. 2020. doi: 10.1088/1742-6596/1490/1/012050.
K. Omari, “Phishing Detection using Gradient Boosting Classifier,” in 3rd International Conference on Evolutionary Computing and Mobile Sustainable Networks (ICECMSN 2023) Phishing, 2023, pp. 120–127. doi: 10.1016/j.procs.2023.12.067.
J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
M. T. Hosain, J. R. Jim, M. F. Mridha, and M. M. Kabir, “Explainable AI approaches in deep learning: Advancements, applications and challenges,” Computers and Electrical Engineering, vol. 117, p. 109246, 2024, doi: 10.1016/j.compeleceng.2024.109246.
V. S. Elangovan, R. Devarajan, O. I. Khalaf, M. S. Sharif, and W. Elmedany, “Analyzing an Imbalanced Stroke Prediction Dataset Using Machine Learning Techniques,” Karbala International Journal of Modern Science, vol. 10, no. 2, pp. 246–259, 2024, doi: 10.33640/2405-609X.3355.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fitria Rahmadani, Wiharto, Shaifudin Zuhdi

This work is licensed under a Creative Commons Attribution 4.0 International License.





