Enhanced Lung Cancer Detection Using ANN with Random Oversampling, RFE-Based Feature Selection, and GridSearchCV Hyperparameter Tuning

Nurwafiqah Nurwafiqah; M. Yudi Al Fiqran; Annisa Nurul Puteri; Muhammad Arafah; Tatik Maslihatin; A.  Sumardin

doi:10.52436/1.jutif.2026.7.2.5391

Authors

Nurwafiqah Informatics, Universitas Teknologi Akba Makassar, Indonesia
M. Yudi Al Fiqran Informatics, Universitas Teknologi Akba Makassar, Indonesia
Annisa Nurul Puteri Computer and Network Engineering, Politeknik Negeri Ujung Pandang, Indonesia
Muhammad Arafah Informatics, Universitas Teknologi Akba Makassar, Indonesia
Tatik Maslihatin Informatics, Universitas Teknologi Akba Makassar, Indonesia
A. Sumardin Informatics, Universitas Teknologi Akba Makassar, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5391

Keywords:

Artificial Neural Network, Early Detection, Feature Selection, Lung Cancer Detection, McNemar's Test, Random Oversampling, Recursive Feature Elimination

Abstract

Amid the most predominant mortality factors on a global scale, Lung cancer constitutes one of the most significant oncological burdens, chiefly because most patients receive a diagnosis only at later stages. The limitations of conventional diagnostic approaches underscore the urgent need for artificial intelligence–based detection systems that can improve both diagnostic accuracy and efficiency. This study aims to develop a lung cancer prediction model using an Artificial Neural Network (ANN) optimized through an integrated strategy that includes data preprocessing, class balancing via Random Oversampling (ROS), feature selection using Recursive Feature Elimination (RFE), and hyperparameter tuning with Grid Search. The evaluation of model effectiveness employs accuracy, precision, recall, F1-score, along with a confusion matrix. Experimental results demonstrate an accuracy of 98%, with average precision, recall, and F1-score values of 0.95. Statistical validation using McNemar’s test confirms a significant performance improvement over the baseline model (χ² = 18.05, p < 0.001), accompanied by a large effect size (Cohen’s h = 0.82). Furthermore, the model exhibits balanced performance in identifying both lung cancer and non-cancer cases, reflecting the effectiveness of the data balancing and feature selection strategies. These findings suggest that the optimized ANN model has strong potential as a foundation for a medical decision support system for early lung cancer detection, contributing to more reliable diagnoses and more accurate clinical decision-making.

Downloads

Download data is not yet available.

References

T. R. Ojha and M. Maharjan, “Machine-learning based prediction of lung cancer,” SCITECH Nepal, vol. 17, no. 1, pp. 72–83, Dec. 2023, doi: 10.3126/scitech.v17i1.60492.

F. Bray et al., “Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 74, no. 3, pp. 229–263, May 2024, doi: 10.3322/caac.21834.

I. N. Mahmood and Hasanen. S. Abdullah, “Lung cancer prediction and risk factors identification using artificial neural network,” Iraqi J. Comput. Commun. Control Syst. Eng., pp. 55–62, Mar. 2022, doi: 10.33103/uot.ijccce.22.1.6.

B. Smolarz, H. Łukasiewicz, D. Samulak, E. Piekarska, R. Kołaciński, and H. Romanowicz, “Lung cancer—epidemiology, pathogenesis, treatment and molecular aspect (review of literature),” Int. J. Mol. Sci., vol. 26, no. 5, Feb. 2025, doi: 10.3390/ijms26052049.

F. H. Tang et al., “Recent advancements in lung cancer research: a narrative review,” Transl. Lung Cancer Res., vol. 14, no. 3, pp. 975–990, Mar. 2025, doi: 10.21037/tlcr-24-979.

S. Andarini et al., “Indonesian society of respirology (ISR) consensus statement on lung cancer screening and early detection in Indonesia,” J. Respirologi Indones., vol. 43, no. 2, pp. 144–150, Apr. 2023, doi: 10.36497/jri.v43i2.455.

C. Li et al., “Advances in lung cancer screening and early detection,” Cancer Biol. Med., vol. 19, no. 5, pp. 591–608, May 2022, doi: 10.20892/j.issn.2095-3941.2021.0690.

C. Bilgin, K. Asil, H. H. Yördan, A. F. Kamanlı, and M. K. Uçar, “Contribution of image processing in chest x-ray to early diagnosis in radiological evaluation of lung cancer,” Balıkesir Med. J., vol. 8, no. 3, pp. 156–162, Dec. 2024, doi: 10.33716/bmedj.1466726.

S. H. Bradley et al., “Chest x-ray sensitivity and lung cancer outcomes: a retrospective observational study,” Br. J. Gen. Pract., vol. 71, no. 712, pp. e862–e868, Nov. 2021, doi: 10.3399/BJGP.2020.1099.

F. Huang et al., “The feasibility and cost-effectiveness of implementing mobile low-dose computed tomography with an AI-based diagnostic system in underserved populations,” BMC Cancer, vol. 25, no. 1, p. 345, Feb. 2025, doi: 10.1186/s12885-025-13710-2.

E. Prisciandaro, G. Sedda, A. Cara, C. Diotti, L. Spaggiari, and L. Bertolaccini, “Artificial neural networks in lung cancer research: a narrative review,” J. Clin. Med., vol. 12, no. 3, p. 880, Jan. 2023, doi: 10.3390/jcm12030880.

E. Oncu and F. Ciftci, “Multimodal AI framework for lung cancer diagnosis: Integrating CNN and ANN models for imaging and clinical data analysis,” Comput. Biol. Med., vol. 193, p. 110488, Jul. 2025, doi: 10.1016/j.compbiomed.2025.110488.

A. R. Luca et al., “Impact of quality, type and volume of data used by deep learning models in the analysis of medical images,” Inform. Med. Unlocked, vol. 29, p. 100911, 2022, doi: 10.1016/j.imu.2022.100911.

J. A. Alkrimi, R. S. M. Hasin, A. Z. Naji, L. E. George, and S. A. Tome, “Classification of imbalanced leukocytes dataset using ANN-based deep learning,” J. Phys. Conf. Ser., vol. 1999, no. 1, p. 012140, Sep. 2021, doi: 10.1088/1742-6596/1999/1/012140.

G. R. Ashisha, X. A. Mary, E. G. M. Kanaga, J. Andrew, and R. J. Eunice, “Random oversampling-based diabetes classification via machine learning algorithms,” Int. J. Comput. Intell. Syst., vol. 17, no. 1, p. 270, Nov. 2024, doi: 10.1007/s44196-024-00678-3.

M. Rahardi, B. P. Asaddulloh, A. Aminuddin, F. F. Abdulloh, I. Saifudin, and F. P. Kusumawijaya, “Optimizing machine learning models for class imbalance in heart disease prediction,” Eng. Technol. Appl. Sci. Res., vol. 15, no. 3, pp. 23599–23604, Jun. 2025, doi: 10.48084/etasr.10407.

A. J. Anju and J. E. Judith, “Hybrid feature selection method for predicting software defect,” J. Eng. Appl. Sci., vol. 71, no. 1, p. 124, Dec. 2024, doi: 10.1186/s44147-024-00453-3.

R.-C. Chen, W. E. Manongga, and C. Dewi, “Recursive feature elimination for improving learning points on hand-sign recognition,” Future Internet, vol. 14, no. 12, p. 352, Nov. 2022, doi: 10.3390/fi14120352.

A. M. Alnasrallah, M. M. Siraj, and H. A. Alrikabi, “Enhancing IDS for the IoMT based on advanced features selection and deep learning methods to increase the model trustworthiness,” PLOS One, vol. 20, no. 7, p. e0327137, Jul. 2025, doi: 10.1371/journal.pone.0327137.

A. K. Chaudhuri, A. Ray, D. K. Banerjee, and A. Das, “A multi-stage approach combining feature selection with machine learning techniques for higher prediction Reliability and accuracy in cervical cancer diagnosis,” Int. J. Intell. Syst. Appl., vol. 13, no. 5, pp. 46–63, Oct. 2021, doi: 10.5815/ijisa.2021.05.05.

S. Risal, Fajar Apriyadi, A. Sumardin, Andini Dani Achmad, and Annisa Nurul Puteri, “Enhancing stroke prediction with logistic regression and support vector machine using oversampling techniques,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 9, no. 3, pp. 646–658, Jun. 2025, doi: 10.29207/resti.v9i3.6431.

J. J. Tanimu, M. Hamada, M. Hassan, H. Kakudi, and J. O. Abiodun, “A machine learning method for classification of cervical cancer,” Electronics, vol. 11, no. 3, p. 463, Feb. 2022, doi: 10.3390/electronics11030463.

A. S. Jaddoa, S. J. Saba, and E. A.Abd Al-Kareem, “Liver disease prediction model based on oversampling dataset with RFE feature selection using ANN and AdaBoost algorithms,” Buana Inf. Technol. Comput. Sci. BIT CS, vol. 4, no. 2, pp. 85–93, Jul. 2023, doi: 10.36805/bit-cs.v4i2.5565.

S. A. Rumhi, R. Hasan, S. Hussain, and J. Pandey, “Lung cancer prediction using machine learning techniques,” J. Stud. Res., May 2023, Accessed: Jan. 24, 2026. [Online]. Available: https://www.jsr.org/index.php/path/article/view/2233

A. R. Yadav and V. Naveen Kumar, “Development of an early prediction system for breast cancer using machine learning techniques,” in 2023 International Conference on Next Generation Electronics (NEleX), Vellore, India: IEEE, Dec. 2023, pp. 1–6. doi: 10.1109/NEleX59773.2023.10421182.

A. S. Sunge, S. Suzanna, and H. M. Mardi Putra, “Interpretable machine learning for employee recruitment prediction Using Boruta, CatBoost, Lasso, logistic regression, NLP, and RFE feature selection,” J. Tek. Inform. Jutif, vol. 6, no. 4, pp. 2153–2170, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4810.

M. Guo et al., “Normal workflow and key strategies for data cleaning toward real-world data: viewpoint,” Interact. J. Med. Res., vol. 12, p. e44310, Sep. 2023, doi: 10.2196/44310.

H. Bichri, A. Chergui, and M. Hain, “Investigating the impact of train / test split ratio on the performance of pre-trained models with custom datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 2, 2024, doi: 10.14569/IJACSA.2024.0150235.

F. Bolikulov, R. Nasimov, A. Rashidov, F. Akhmedov, and Y.-I. Cho, “Effective methods of categorical data encoding for artificial intelligence algorithms,” Mathematics, vol. 12, no. 16, p. 2553, Aug. 2024, doi: 10.3390/math12162553.

L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The choice of scaling technique matters for classification performance,” 2022, doi: 10.48550/ARXIV.2212.12343.

Md. A. Talukder et al., “Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction,” 2024, arXiv. doi: 10.48550/ARXIV.2401.12262.

V. Atluri, K. Heidary, and J. Bland, “Performance evaluation of machine learning algorithms in reduced dimensional spaces,” J. Cyber Secur., vol. 6, no. 1, pp. 69–87, 2024, doi: 10.32604/jcs.2024.051196.

Roheen Qamar and Baqar Ali Zardari, “Artificial neural networks: an overview,” Mesopotamian J. Comput. Sci., vol. 2023, pp. 124–133, Aug. 2023, doi: 10.58496/MJCSC/2023/015.

J. Kufel et al., “What is machine learning, artificial neural networks and deep learning?—examples of practical applications in medicine,” Diagnostics, vol. 13, no. 15, p. 2582, Aug. 2023, doi: 10.3390/diagnostics13152582.

J. A Ilemobayo et al., “Hyperparameter tuning in machine learning: a comprehensive review,” J. Eng. Res. Rep., vol. 26, no. 6, pp. 388–395, Jun. 2024, doi: 10.9734/jerr/2024/v26i61188.

S. V. Yefimov, “Comparison of the sensitivity of sterility tests based on the analysis of 2x2 contingency tables,” South Asian Res. J. Pharm. Sci., vol. 4, no. 4, pp. 75–82, Jul. 2022, doi: 10.36346/sarjps.2022.v04i04.001.

L. Laurencelle and D. Cousineau, “Analysis of proportions using arcsine transform with any experimental design,” Front. Psychol., vol. 13, p. 1045436, Jan. 2023, doi: 10.3389/fpsyg.2022.1045436.

A. Cremades, S. Hoyas, and R. Vinuesa, “Additive-feature-attribution methods: A review on explainable artificial intelligence for fluid dynamics and heat transfer,” Int. J. Heat Fluid Flow, vol. 112, p. 109662, Mar. 2025, doi: 10.1016/j.ijheatfluidflow.2024.109662.

E. Dritsas and M. Trigka, “Lung cancer risk prediction with machine learning models,” Big Data Cogn. Comput., vol. 6, no. 4, p. 139, Nov. 2022, doi: 10.3390/bdcc6040139.

S. Pechprasarn, N. Suechoey, N. Pholtrakoolwong, P. Tanedvorapinyo, and Y. Toboonliang, “Optimizing lung cancer diagnosis with machine learning and feature selection methods,” J. Curr. Sci. Technol., vol. 14, no. 3, p. 55, Sep. 2024, doi: 10.59796/jcst.V14N3.2024.55.

B. Moozhippurath and J. Natarajan, “Lung cancer prediction with advanced graph neural networks,” Indones. J. Electr. Eng. Comput. Sci., vol. 37, no. 2, p. 1077, Feb. 2025, doi: 10.11591/ijeecs.v37.i2.pp1077-1084.