Visual Interpretation of Machine Learning Models (Random Forest) for Lung Cancer Risk Classification Using Explainable Artificial Intelligence (SHAP & LIME)
DOI:
https://doi.org/10.52436/1.jutif.2025.6.4.4925Keywords:
CRISP-DM, Explainable AI, Lung Cancer, Machine Learning, Risk ClassificationAbstract
Lung cancer remains one of the most prevalent and burdensome cancers worldwide, with delayed diagnosis being a persistent challenge—particularly in Indonesia, where no national screening program currently exists. In this collaborative study, we aim to develop an interpretable machine learning model for classifying lung cancer risk levels using the Explainable Artificial Intelligence (XAI) approach. The CRISP-DM framework was applied, and the dataset underwent cleaning, feature selection, labeling, and transformation, resulting in 152 valid entries. Tree ensemble algorithms—XGBoost, Random Forest, and LightGBM—were used, with Random Forest achieving the best performance at 97.38% accuracy. SHAP and LIME methods were integrated to provide transparent visual interpretations. A web-based system was developed using Streamlit, incorporating these visualizations and automated narrative summaries generated by a language model to assist non-technical users. A simulated case based on a published pediatric lung cancer report was used to demonstrate its interpretability and illustrate its potential applicability in clinical workflows. The proposed system offers an interpretable and scalable solution for early lung cancer risk classification, which may enhance decision support in primary care and promote trust in AI-assisted diagnostics.
Downloads
References
A. Mahmood and R. Srivastava, “CHAPTER 3 - Etiology of cancer,” in Understanding Cancer, B. Jain and S. Pandey, Eds. Academic Press, 2022, pp. 37–62, doi: 10.1016/B978-0-323-99883-3.00008-1.
F. Bray, M. Laversanne, H. Sung, J. Ferlay, R. L. Siegel, I. Soerjomataram, and A. Jemal, “Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA Cancer J. Clin., vol. 74, no. 4, pp. 362–387, Apr. 2024, doi: 10.3322/caac.21834.
K. C. Thandra, A. Barsouk, K. Saginala, J. S. Aluru, and A. Barsouk, “Epidemiology of lung cancer,” Contemp. Oncol. (Pozn), vol. 25, no. 1, pp. 45–52, 2021, doi: 10.5114/wo.2021.103829.
O. D. Asmara et al., “Lung cancer in Indonesia,” J. Thorac. Oncol., vol. 18, no. 9, pp. 1134–1145, 2023, doi: 10.1016/j.jtho.2023.06.010.
Kementerian Kesehatan Republik Indonesia, Rencana Kanker Nasional 2024–2034, Jakarta: Direktorat P2PTM, 2024. [Online]. Available: https://www.iccp-portal.org/sites/default/files/plans/Rencana_Kanker_Nasional_2024-2034.pdf
A. S. Ahmad and A. M. Mayya, “A new tool to predict lung cancer based on risk factors,” Heliyon, vol. 6, no. 2, p. e03402, 2020, doi: 10.1016/j.heliyon.2020.e03402.
S. M. Sakthisankaran, D. Sakthipriya, and M. Swamivelmanickam, “Health risks associated with tobacco consumption in humans: An overview,” J. Drug Deliv. Ther., vol. 14, no. 5, 2024.
A. Agustí et al., “Global Initiative for Chronic Obstructive Lung Disease 2023 Report: GOLD Executive Summary,” Eur. Respir. J., vol. 61, no. 4, 2300239, 2023, doi: 10.1183/13993003.00239-2023.
L. S. Flor, J. A. Anderson, N. Ahmad, et al., “Health effects associated with exposure to secondhand smoke: a Burden of Proof study,” Nat. Med., vol. 30, pp. 149–167, 2024, doi: 10.1038/s41591-023-02743-4.
O. I. Onwurah, "A data analysis of the correlation between smoking and lung cancer," SSRN, May 11, 2025, doi: 10.2139/ssrn.5250309.
Y. Huang et al., “Air pollution, genetic factors, and the risk of lung cancer: A prospective study in the UK Biobank,” Am. J. Respir. Crit. Care Med., vol. 204, no. 7, pp. 817–825, 2021, doi: 10.1164/rccm.202011-4063OC.
M. I. D. Rakasiwi, W. Prasetya, I. Riyatno, et al., “Starting early palliative care for suspected lung cancer patient: A case series from resource-limited setting in Indonesia,” Rwanda Med. J., vol. 80, no. 4, pp. 5–9, 2023, doi: 10.4314/rmj.v81i2.1.
S. Yulianti, M. A. Budiman, and M. Amri, “Utilization of radiological techniques in early diagnosis of lung cancer,” Int. J. Eng. Emerg. Technol. (IJEET), vol. 2, no. 2, pp. 191–197, Mar. 2024, doi: 10.61991/ijeet.v2i2.35.
U. Chandran, J. Reps, R. Yang, A. Vachani, F. Maldonado, and I. Kalsekar, “Machine learning and real-world data to predict lung cancer risk in routine care,” Cancer Epidemiol. Biomarkers Prev., vol. 32, no. 3, pp. 337–343, 2023, doi: 10.1158/1055-9965.EPI-22-0873.
L. Swinckels et al., “The use of deep learning and machine learning on longitudinal electronic health records for the early detection and prevention of diseases: Scoping review,” J. Med. Internet Res., vol. 26, e48320, 2024, doi: 10.2196/48320.
A. Houston, S. Williams, W. Ricketts, et al., “Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: A pilot study,” BMC Med. Inform. Decis. Mak., vol. 24, p. 371, 2024, doi: 10.1186/s12911-024-02790-y.
A. N. Fatyandri, B. Guo, and E. S. Muchsinati, “Impact of artificial intelligence and human resource management on leadership organization performance,” J. Tek. Manaj. Inform., vol. 10, no. 2, pp. 123–132, 2024, doi: 10.26905/jtmi.v10i2.14060.
N. Kühl, M. Schemmer, M. Goutier, et al., “Artificial intelligence and machine learning,” Electron. Markets, vol. 32, pp. 2235–2244, 2022, doi: 10.1007/s12525-022-00598-0.
S. Gondhowiardjo et al., “Five-year cancer epidemiology at the national referral hospital: Hospital-based cancer registry data in Indonesia,” JCO Glob. Oncol., vol. 7, pp. 190–203, 2021, doi: 10.1200/GO.20.00155.
V. Hassija, V. Chamola, A. Mahapatra, et al., “Interpreting black-box models: A review on explainable artificial intelligence,” Cogn. Comput., vol. 16, pp. 45–74, 2024, doi: 10.1007/s12559-023-10179-8.
N. R. Nuraeda, M. Liebenlito, and T. E. Sutanto, “Explainable sentiment analysis pada ulasan aplikasi Shopee menggunakan Local Interpretable Model-agnostic Explanations,” Indones. J. Comput. Sci., vol. 13, no. 3, 2024, doi: 10.33022/ijcs.v13i3.3870.
E. Dritsas and M. Trigka, “Lung cancer risk prediction with machine learning models,” Big Data Cogn. Comput., vol. 6, no. 4, p. 139, 2022, doi: 10.3390/bdcc6040139.
M. Mamun, A. Farjana, M. A. Mamun, and M. S. Ahammed, “Lung cancer prediction model using ensemble learning techniques and a systematic review analysis,” in 2022 IEEE World AI IoT Congress (AIIoT), 2022, pp. 187–193, doi: 10.1109/AIIOT54504.2022.9817326.
M. M. R. Sweet, M. P. Ahmed, M. A. S. Mozumder, et al., “Comparative analysis of machine learning techniques for accurate lung cancer prediction,” Am. J. Eng. Technol., vol. 6, no. 9, pp. 92–103, Sep. 2024, doi: 10.37547/tajet/Volume06Issue09-11.
R. K. Pathan, I. J. Shorna, M. S. Hossain, et al., “The efficacy of machine learning models in lung cancer risk prediction with explainability,” PLoS ONE, vol. 19, no. 6, p. e0305035, 2024, doi: 10.1371/journal.pone.0305035.
L. Dwiyanti, N. Nambo, and N. Hamid, “Leveraging Explainable Artificial Intelligence (XAI) for expert interpretability in predicting rapid kidney enlargement risks in autosomal dominant polycystic kidney disease (ADPKD),” AI, vol. 5, no. 4, pp. 2037–2065, 2024, doi: 10.3390/ai5040100.
V. Singh, A. Singh, and K. Joshi, “Fair CRISP-DM: Embedding fairness in machine learning (ML) development life cycle,” in Proc. 55th Hawaii Int. Conf. on System Sciences (HICSS), 2022, pp. 260–269. [Online]. Available: https://aisel.aisnet.org/hicss-55/da/algorithmic_fairness/3.
M. Cazacu and E. Titan, “Peculiarities of providing psychological assistance to abused children,” BRAIN: Broad Research in Artificial Intelligence and Neuroscience, vol. 11, no. 2Sup1, pp. 99–106, 2020, doi: 10.18662/brain/11.2Sup1/97.
TheDevastator, “Lung Cancer Prediction,” Kaggle Datasets, 2020. [Online]. Available: https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-pollution-a-new-link.
M. Elkabalawy, A. Al-Sakkaf, E. M. Abdelkader, and G. Alfalah, “CRISP-DM-based data-driven approach for building energy prediction utilizing indoor and environmental factors,” Sustainability, vol. 16, no. 17, p. 7249, 2024, doi: 10.3390/su16177249.
S. Studer, T. B. Bui, C. Drescher, A. Hanuschkin, L. Winkler, S. Peters, and K.-R. Müller, “Towards CRISP-ML(Q): A machine learning process model with quality assurance methodology,” Mach. Learn. Knowl. Extract., vol. 3, no. 2, pp. 392–413, 2021, doi: 10.3390/make3020020.
M. A. Muslim, T. L. Nikmah, D. A. A. Pertiwi, Subhan, Jumanto, Y. Dasril, and Iswanto, “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning,” Inform. Sci. Wound. Appl., vol. 20, p. 200204, 2023, doi: 10.1016/j.iswa.2023.200204.
A. B. Arrieta et al., “Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, 2020, doi: 10.1016/j.inffus.2019.12.012.
A. Manzoor, M. A. Qureshi, E. Kidney, and L. Longo, “A review on machine learning methods for customer churn prediction and recommendations for business practitioners,” IEEE Access, vol. 12, pp. 70434–70463, 2024, doi: 10.1109/ACCESS.2024.3402092.
N. Arifin, C. N. Insani, M. Milasari, J. Rusman, S. Upa, and M. S. A. Utama, “Classification of helmet and vest usage for occupational safety monitoring using backpropagation neural network,” J. Tek. Inform. (JUTIF), vol. 6, no. 3, pp. 1255–1266, Jun. 2025, doi: 10.52436/1.jutif.2025.6.3.4781.
K. Ali, Z. A. Shaikh, A. A. Khan, and A. A. Laghari, “Multiclass skin cancer classification using EfficientNets – a first step towards preventing skin cancer,” Neuroscience Informatics, vol. 2, no. 4, Art. no. 100034, Dec. 2022, doi: 10.1016/j.neuri.2021.100034.
R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi kanker paru-paru menggunakan algoritme Random Forest Decision Tree,” J. Pengemb. Teknol. Inf. Ilmu Komput., vol. 7, no. 7, pp. 3448–3456, 2023. [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12964.
World Health Organization, "WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide," Geneva: World Health Organization, 2021. Licence: CC BY-NC-SA 3.0 IGO.
M. Wang, R. Y. Kim, M. R. J. Kohonen-Corish et al., "Particulate matter air pollution as a cause of lung cancer: epidemiological and experimental evidence," Br. J. Cancer, vol. 132, pp. 986–996, 2025, doi: 10.1038/s41416-025-02999-2.
D. Li, J. Shi, D. Liang, M. Ren, and Y. He, "Lung cancer risk and exposure to air pollution: a multicenter North China case-control study involving 14604 subjects," BMC Pulmonary Medicine, vol. 23, no. 1, p. 182, 2023, doi: 10.1186/s12890-023-02480-x.
S. Jun et al., "Cancer risk based on alcohol consumption levels: a comprehensive systematic review and meta-analysis," Epidemiol Health, vol. 45, e2023092, Oct. 2023, doi: 10.4178/epih.e2023092.
C. Bertola, C. Gobbetti, G. Baccarini, and R. Fabiani, "Wine consumption and lung cancer risk: A systematic review and meta-analysis," Nutrients, vol. 17, no. 8, p. 1322, 2025, doi: 10.3390/nu17081322.
S. Peters et al., "Occupational exposure to organic dust increases lung cancer risk in the general population," Thorax, vol. 67, no. 2, pp. 111–116, 2012, doi: 10.1136/thoraxjnl-2011-200716.
D. Wang, W. Li, N. Albasha et al., "Long-term exposure to house dust mites accelerates lung cancer development in mice," J. Exp. Clin. Cancer Res., vol. 42, p. 26, 2023, doi: 10.1186/s13046-022-02587-9.
M. Pyambri, S. Lacorte, J. Jaumot, and C. Bedia, "Effects of indoor dust exposure on lung cells: Association of chemical composition with phenotypic and lipid changes in a 3D lung cancer cell model," Ecotoxicol. Public Health, Nov. 2023, doi: 10.1021/acs.est.3c07573.
M. Xu et al., "Prevalent occupational exposures and risk of lung cancer among women: Results from the application of the Canadian Job-Exposure Matrix (CANJEM) to a combined set of ten case–control studies," Am. J. Ind. Med., early access, Jan. 8, 2024, doi: 10.1002/ajim.23562.
J. S. Thakur, A. Rana, R. Kaur, and S. Malhotra, "Exposure to occupational carcinogens and risk of lung cancer: A systematic review and meta-analysis," Int. J. Noncommun. Dis., vol. 8, no. 3, pp. 129–136, Jul.–Sep. 2023, doi: 10.4103/jncd.jncd_50_23.
T. C. García, A. Ruano-Ravina, C. Candal-Pedreira et al., "Occupation as a risk factor of small cell lung cancer," Sci. Rep., vol. 13, p. 4727, 2023, doi: 10.1038/s41598-023-31991-0.
H. Yuan, Y. Wang, and H. Duan, "Risk of lung cancer and occupational exposure to polycyclic aromatic hydrocarbons among workers cohorts — Worldwide, 1969–2022," China CDC Weekly, Apr. 29, 2022, doi: 10.46234/ccdcw2022.085.
K. R. Starke, U. Bolm-Audorff, D. Reissig, and A. Seidler, "Dose-response-relationship between occupational exposure to diesel engine emissions and lung cancer risk: A systematic review and meta-analysis," Int. J. Hyg. Environ. Health, vol. 256, Art. no. 114299, Mar. 2024, doi: 10.1016/j.ijheh.2023.114299.
B. Kim, E. Y. Park, J. Kim, E. Park, J. K. Oh, and M. K. Lim, "Occupational exposure to pesticides and lung cancer risk: A propensity score analyses," Cancer Res. Treat., vol. 54, no. 1, pp. 130–139, 2022, doi: 10.4143/crt.2020.1106.
L. Ang, C. P. Y. Chan, W. P. Yau, and W. J. Seow, "Association between family history of lung cancer and lung cancer risk: A systematic review and meta-analysis," Lung Cancer, vol. 148, pp. 129–137, 2020, doi: 10.1016/j.lungcan.2020.08.012.
F. Citarella et al., "Clinical implications of the family history in patients with lung cancer: A systematic review of the literature and a new cross-sectional/prospective study design (FAHIC: lung)," J. Transl. Med., vol. 22, p. 714, 2024, doi: 10.1186/s12967-024-05538-4.
W. Wei, S. Wang, Z. Yuan, et al., “Plant-based diets and the risk of lung cancer: a large prospective cohort study,” European Journal of Nutrition, vol. 64, p. 73, 2025. doi: 10.1007/s00394-024-03570-0.
H. Yan, X. Jin, C. Zhang, et al., “Associations between diet and incidence risk of lung cancer: A Mendelian randomization study,” Frontiers in Nutrition, vol. 10, Mar. 2023. doi: 10.3389/fnut.2023.1149317.
L. Peng, Q. Du, L. Xiang, et al., “Adherence to the low-fat diet pattern reduces the risk of lung cancer in American adults aged 55 years and above: a prospective cohort study,” The Journal of Nutrition, Health and Aging, vol. 28, no. 7, p. 100240, Jul. 2024. doi: 10.1016/j.jnha.2024.100240.
H. Wu, J. Yang, H. Wang, and L. Li, "Mendelian randomization to explore the direct or mediating associations between socioeconomic status and lung cancer," Front. Oncol., vol. 13, Art. no. 1143059, 2023, doi: 10.3389/fonc.2023.1143059.
C. Faselis et al., "Assessment of lung cancer risk among smokers for whom annual screening is not recommended," JAMA Oncol., vol. 8, no. 10, pp. 1428–1437, 2022, doi: 10.1001/jamaoncol.2022.2952.
D. S. Gutiérrez-Torres et al., "Changes in smoking use and subsequent lung cancer risk in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study," J. Natl. Cancer Inst., vol. 116, no. 6, pp. 895–901, 2024, doi: 10.1093/jnci/djae012.
F. Lotfi et al., "Exposure to second-hand smoke and risk of lung cancer among Iranian population: A multicenter case-control study," PLOS ONE, Jul. 10, 2024, doi: 10.1371/journal.pone.0306517.
S. Elkefi, G. Zeinoun, A. Tounsi, J. M. Bruzzese, C. Lelutiu-Weinberger, and A. K. Matthews, "Second-hand smoke exposure and risk of lung cancer among nonsmokers in the United States: A systematic review and meta-analysis," Int. J. Environ. Res. Public Health, vol. 22, no. 4, p. 595, 2025, doi: 10.3390/ijerph22040595.
L. M. S. Sætre, K. Balasubramaniam, J. Søndergaard et al., "Smoking status, symptom significance and healthcare seeking with lung cancer symptoms in the Danish general population," npj Prim. Care Respir. Med., vol. 35, p. 3, 2025, doi: 10.1038/s41533-025-00412-2.
S. Zhang, Y. Deng, X. Xiang, Q. Xu, L. Hu, M. Xia, and L. Liu, "Postoperative symptom network analysis in non-small cell lung cancer patients: A cross-sectional study," BMC Pulm. Med., vol. 25, no. 1, p. 244, 2025, doi: 10.1186/s12890-025-03711-z
B. C. Bade et al., "Cancer-related fatigue in lung cancer: A research agenda: An official American Thoracic Society research statement," Am. J. Respir. Crit. Care Med., vol. 207, no. 5, 2023, doi: 10.1164/rccm.202210-1963ST.
J. Shin et al., "Distinct shortness of breath profiles in oncology outpatients undergoing chemotherapy," J. Pain Symptom Manage., vol. 65, no. 3, pp. 242–255, Mar. 2023, doi: 10.1016/j.jpainsymman.2022.11.010.
A. Qdaisat et al., "Severity of symptoms as an independent predictor of poor outcomes in patients with advanced cancer presenting to the emergency department: Secondary analysis of a prospective randomized study," Cancers, vol. 16, no. 23, p. 3988, 2024, doi: 10.3390/cancers16233988.
S. Marmor, S. Cohen, N. Fujioka, L. C. Cho, A. Bhargava, and S. Misono, "Dysphagia prevalence and associated survival differences in older patients with lung cancer: A SEER-Medicare population-based study," J. Geriatr. Oncol., vol. 11, no. 7, pp. 1115–1117, 2020, doi: 10.1016/j.jgo.2020.02.015.
P. Obarski and J. Włodarczyk, "Alleviation of malignant dysphagia in inoperable lung cancer," Ann. Palliat. Med., vol. 12, no. 4, pp. 738–747, 2023, doi: 10.21037/apm-22-1144.
K. M. Udayappan and C. V. Anstine, "What diagnostic tests should be done after discovering clubbing in a patient without cardiopulmonary symptoms?," Cleve. Clin. J. Med., vol. 92, no. 5, pp. 273–276, May 2025, doi: 10.3949/ccjm.92a.24052.
J. Zheng et al., "Hospital-treated infectious diseases, infection burden, and risk of lung cancer: An observational and Mendelian randomization study," Chest, vol. 167, no. 1, pp. 270–282, Jan. 2025, doi: 10.1016/j.chest.2024.06.3811.
L. Shahkar, N. Bigdeli, M. Mazandarani, and N. Lashkarbolouk, "A rare case of pulmonary adenocarcinoma in an 8-year-old patient with persistent respiratory manifestation: A case report study," Case Rep. Oncol., vol. 16, no. 1, pp. 739–745, Aug. 2023, doi: 10.1159/000531986. PMID: 37933310.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Irwan Fathur Rosyid, Himawan Pramaditya

This work is licensed under a Creative Commons Attribution 4.0 International License.