Prediction of Life Expectancy of Lung Cancer Patients After Thoracic Surgery Using Decision Tree Algorithm and Adaptive Synthetic Sampling
DOI:
https://doi.org/10.52436/1.jutif.2025.6.5.4724Keywords:
Algorithm Optimization, Healthcare AI, Survival Prediction, Synthetic Sampling, Thoraric PrognosticsAbstract
This research focuses on predicting the life expectancy of lung cancer patients after undergoing thoracic surgery, using a decision tree classification algorithm (C4.5) combined with adaptive synthetic sampling to handle data imbalance. Data imbalance in the lung cancer patient dataset is a major obstacle in obtaining accurate prediction results, especially in identifying minority classes. Data imbalance in the lung cancer patient dataset is a major obstacle in obtaining accurate prediction results, especially in identifying minority classes. By applying ADASYN, the data distribution becomes more even, thus improving the performance of the C4.5 model. The results showed that combining these methods increased the prediction accuracy from 67% to 87%. In addition, the precision, recall, and f1-score for minority classes have significantly improved, which were previously difficult to identify by the model. Thus, combining the C4.5 algorithm and the ADASYN technique proved effective in dealing with the challenge of data imbalance and resulted in better prediction in the case of lung cancer. This study is expected to contribute to the field of medical classification and serve as a reference for further research on similar cases.
Downloads
References
R. Patra, Prediction of lung cancer using machine learning classifier, vol. 1235 CCIS. Springer Singapore, 2020. doi: 10.1007/978-981-15-6648-6_11.
I. Jabin and M. M. Rahman, “Predicting lung cancer survivability : A machine learning regression model Predicting lung cancer survivability : A machine learning regression model,” vol. 11, no. May, pp. 68–81, 2021.
R. Rami-Porta, C. Wittekind, and P. Goldstraw, “Complete Resection in Lung Cancer Surgery: From Definition to Validation and Beyond,” J. Thorac. Oncol., vol. 15, no. 12, pp. 1815–1818, 2020, doi: 10.1016/j.jtho.2020.09.006.
N. Maleki, Y. Zeinali, and S. T. A. Niaki, “A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection,” Expert Syst. Appl., vol. 164, p. 113981, Feb. 2021, doi: 10.1016/j.eswa.2020.113981.
P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable ai: A review of machine learning interpretability methods,” Entropy, vol. 23, no. 1, pp. 1–45, 2021, doi: 10.3390/e23010018.
L. Indi, P. Aji, U. A. Yogyakarta, A. Sunyoto, and U. A. Yogyakarta, “An Implementation of C4 . 5 Classification Algorithm to Analyze Student ’ s Performance,” pp. 5–9, 2021.
A. Mailana, A. A. Putra, S. Hidayat, and A. Wibowo, “Comparison of C4.5 Algorithm and Support Vector Machine in Predicting the Student Graduation Timeliness,” J. Online Inform., vol. 6, no. 1, p. 11, Jun. 2021, doi: 10.15575/join.v6i1.608.
R. A. Saputra et al., “Detecting Alzheimer’s Disease by the Decision Tree Methods Based on Particle Swarm Optimization,” J. Phys. Conf. Ser., vol. 1641, no. 1, pp. 61–67, 2020, doi: 10.1088/1742-6596/1641/1/012025.
A. Helisa, H. Saragih, I. Budiman, F. Indriani, D. Kartini, and T. H. Saragih, “Prediction of Post-Operative Survival Expectancy in Thoracic Lung Cancer Surgery Using Extreme Learning Machine and SMOTE,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 2, pp. 239–249, 2023, doi: 10.26555/jiteki.v9i2.25973.
M. Koziarski, “Radial-Based Undersampling for imbalanced data classification,” Pattern Recognit., vol. 102, 2020, doi: 10.1016/j.patcog.2020.107262.
B. Xue et al., “Use of Machine Learning to Develop and Evaluate Models Using Preoperative and Intraoperative Data to Identify Risks of Postoperative Complications,” JAMA Netw. Open, vol. 4, no. 3, p. e212240, Mar. 2021, doi: 10.1001/jamanetworkopen.2021.2240.
G. Kovács, “Smote-variants: A python implementation of 85 minority oversampling techniques,” Neurocomputing, vol. 366, pp. 352–354, 2019, doi: 10.1016/j.neucom.2019.06.100.
Y. Cao, X. Zhao, Z. Zhou, Y. Chen, X. Liu, and Y. Lang, “MIAC: Mutual-Information Classifier with ADASYN for Imbalanced Classification,” 2018 Int. Conf. Secur. Pattern Anal. Cybern., pp. 494–498, 2018.
C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 227–238, 2023, doi: 10.30812/matrik.v22i2.2515.
A. Alam, D. A. F. Alana, and C. Juliane, “Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes,” Sinkron, vol. 8, no. 4, pp. 2641–2650, 2023, doi: 10.33395/sinkron.v8i4.12998.
K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Glob. Transitions Proc., vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.
Y. Fakir, M. Azalmad, and R. Elaychi, “Study of The ID3 and C4.5 Learning Algorithms,” J. Med. Informatics Decis. Mak., vol. 1, no. 2, pp. 29–43, Apr. 2020, doi: 10.14302/issn.2641-5526.jmid-20-3302.
N. G. Ramadhan, “Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus,” Sci. J. Informatics, vol. 8, no. 2, pp. 276–282, 2021, doi: 10.15294/sji.v8i2.32484.
A. Alhudhaif, “A Novel Multi-class Imbalanced EEG Signals Classification Based on the Adaptive Synthetic Sampling (ADASYN) approach,” PeerJ Comput. Sci., vol. 7, pp. 1–15, 2021, doi: 10.7717/PEERJ-CS.523.
T. M. Khan, S. Xu, Z. G. Khan, and M. U. Chishti, “Implementing multilabeling, ADASYN, and relieff techniques for classification of breast cancer diagnostic through machine learning: Efficient computer-aided diagnostic system,” J. Healthc. Eng., vol. 2021, 2021, doi: 10.1155/2021/5577636.
L. A. Sevastianov and E. Y. Shchetinin, “On methods for improving the accuracy of multi-class classification on imbalanced data,” CEUR Workshop Proc., vol. 2639, pp. 70–82, 2020.
D. Krstinić, M. Braović, L. Šerić, and D. Božić-Štulić, “Multi-label Classifier Performance Evaluation with Confusion Matrix,” pp. 01–14, 2020, doi: 10.5121/csit.2020.100801.
N. E. Ramli, Z. R. Yahya, and N. A. Said, “Confusion Matrix as Performance Measure for Corner Detectors,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 29, no. 1, pp. 256–265, 2022, doi: 10.37934/araset.29.1.256265.
S. Napi, T. Hamonangan Saragih, D. Turianto Nugrahadi, D. Kartini, and F. Abadi, “Implementation of Monarch Butterfly Optimization for Feature Selection in Coronary Artery Disease Classification Using Gradient Boosting Decision Tree,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 314–323, 2023.
K. Ali, Z. A. Shaikh, A. A. Khan, and A. A. Laghari, “Multiclass skin cancer classification using EfficientNets – a first step towards preventing skin cancer,” Neurosci. Informatics, vol. 2, no. 4, p. 100034, 2022, doi: 10.1016/j.neuri.2021.100034.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Erdi, Muhammad Itqan Mazdadi, Radityo Adi Nugroho, Andi Farmadi, Triando Hamonangan Saragih, Hasri Akbar Awal Rozaq

This work is licensed under a Creative Commons Attribution 4.0 International License.