Implementation of Extra Trees Classifier and Chi-Square Feature Selection for Early Detection of Liver Disease

Muhammad Akmal Al Ghifari; Irwan Budiman; Triando Hamonangan Saragih; Muhammad Itqan Mazdadi; Rudy Herteno; Hasri Akbar Awal Rozaq

doi:10.52436/1.jutif.2025.6.5.4261

Authors

Muhammad Akmal Al Ghifari Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
Irwan Budiman Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
Triando Hamonangan Saragih Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
Muhammad Itqan Mazdadi Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
Rudy Herteno Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
Hasri Akbar Awal Rozaq Graduate School of Informatics, Department of Computer Science, Gazi University, Türkiye

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.4261

Keywords:

Chi-Square, Extra Trees Classifier, Feature Selection, Liver

Abstract

The imbalanced distribution of medical data poses challenges in accurately detecting liver disease, which is crucial as symptoms often remain unnoticed until advanced stages. This study examines the application of the Extra Trees Classifier algorithm and chi-square feature selection for early detection of liver disease. Compared to traditional methods like Random Forest and SVM, the Extra Trees Classifier offers enhanced computational efficiency and better handling of imbalanced datasets, while chi-square feature selection helps identify the most relevant medical indicators. The data consists of five medical variables likely to be laboratory test results from patient samples, with labels indicating classes A and B. The data is randomly divided with a ratio of 80% for each class. To address data imbalance, SMOTE technique was applied before the data was randomly split into a ratio of 80% for training and 20% for testing to ensure effective learning and testing of the model's performance. The results showed that with the help of chi-square feature selection, the Extra Trees Classifier algorithm could provide fairly accurate predictions in liver disease classification, with an accuracy of 82.6%, sensitivity of 85.5%, precision of 78.3%, and F1-Score of 81.7%. These results demonstrate significant improvement over existing methods, and the proposed approach can aid healthcare practitioners in making timely diagnostic decisions, potentially reducing mortality rates through early intervention in liver disease cases.

Downloads

Download data is not yet available.

References

E. Patimah, V. B. Haekal, and D. Sandya Prasvita, “Klasifikasi Penyakit Liver dengan Menggunakan Metode Decision Tree,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, vol. 2, no. 1, pp. 655–659, 2021.

B. N. P. Maharani, A. D. Hendriani, and P. W. P. Iswari, “Liver Cirrhosis: Pathophysiology, Diagnosis, and Management,” J. Biol. Trop., vol. 23, no. 1, pp. 457–463, 2023, doi: 10.29303/jbt.v23i1.5763.

M. Ghosh et al., “A comparative analysis of machine learning algorithms to predict liver disease,” Intell. Autom. Soft Comput., vol. 30, no. 3, pp. 917–928, 2021, doi: 10.32604/iasc.2021.017989.

Z. Guo et al., “A randomized-controlled trial of ischemia-free liver transplantation for end-stage liver disease,” J. Hepatol., vol. 79, no. 2, pp. 394–402, 2023, doi: 10.1016/j.jhep.2023.04.010.

M. E. Rinella et al., AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease, vol. 77, no. 5. 2023.

L. Rong, J. Zou, W. Ran, and X. Qi, “Advancements in the treatment of non-alcoholic fatty liver disease ( NAFLD ),” no. January, pp. 1–18, 2023, doi: 10.3389/fendo.2022.1087260.

M. V. Machado, “Aerobic exercise in the management of metabolic dysfunction associated fatty liver disease,” Diabetes, Metabolic Syndrome and Obesity, vol. 14. pp. 3627–3645, 2021, doi: 10.2147/DMSO.S304357.

A. S. Afrah, “Sistem Diagnosa Penyakit Liver Menggunakan Metode Artificial Neural Network: Studi Berdasarkan Dataset Indian Liver Patient Dataset,” J. Inform. J. Pengemb. IT, vol. 8, no. 3, pp. 308–312, Dec. 2023, doi: 10.30591/jpit.v8i3.5346.

M. T. Long, M. Noureddin, and J. K. Lim, “CLINICAL PRACTICE UPDATE AGA Clinical Practice Update : Diagnosis and Management Expert Review,” Gastroenterology, vol. 163, no. 3, pp. 764-774.e1, 2022, doi: 10.1053/j.gastro.2022.06.023.

D. S. Ali and M. A. Aljabery, “Predicting Liver Cirrhosis Stages Using Extra Trees, Random Forest, and SVM with Data Mining Techniques,” Inform., vol. 48, no. 21, pp. 15–26, 2024, doi: 10.31449/inf.v48i21.6752.

F. Muhammad et al., “Liver Ailment Prediction Using Random Forest Model,” Comput. Mater. Contin., vol. 74, no. 1, pp. 1049–1067, 2023, doi: 10.32604/cmc.2023.032698.

Y. O. Daddala and K. Shaik, “Cardiovascular Disease Prediction: Employing Extra Tree Classifier-Based Feature Selection and Optimized RNN with Artificial Bee Colony,” Rev. d’Intelligence Artif., vol. 38, no. 2, pp. 643–653, Apr. 2024, doi: 10.18280/ria.380228.

Y. Duan et al., “Association of Inflammatory Cytokines With Non-Alcoholic Fatty Liver Disease,” Front. Immunol., vol. 13, no. May, 2022, doi: 10.3389/fimmu.2022.880298.

D. Sharma, R. Kumar, and A. Jain, “Measurement : Sensors Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning,” Meas. Sensors, vol. 24, no. September, p. 100560, 2022, doi: 10.1016/j.measen.2022.100560.

M. Mahmud et al., “Implementation of C5.0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 116–124, Mar. 2024, doi: 10.35882/jeeemi.v6i2.384.

S. M. Ganie, P. K. Dutta Pramanik, and Z. Zhao, “Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, p. 160, Jun. 2024, doi: 10.1186/s12911-024-02550-y.

A. Ahmad, S. Akbar, M. Tahir, M. Hayat, and F. Ali, “iAFPs-EnC-GA: Identifying antifungal peptides using sequential and evolutionary descriptors based multi-information fusion and ensemble learning approach,” Chemom. Intell. Lab. Syst., vol. 222, no. 06, p. 104516, Mar. 2022, doi: 10.1016/j.chemolab.2022.104516.

P. Theerthagiri, “Liver disease classification using histogram-based gradient boosting classification tree with feature selection algorithm,” Biomed. Signal Process. Control, vol. 100, p. 107102, Feb. 2025, doi: 10.1016/j.bspc.2024.107102.

A. Panwar, V. Bhatnagar, M. Khari, A. W. Salehi, and G. Gupta, “A Blockchain Framework to Secure Personal Health Record (PHR) in IBM Cloud-Based Data Lake,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/3045107.

A. S. Abdalrada, J. Abawajy, T. Al-Quraishi, and S. M. S. Islam, “Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: a retrospective cohort study,” J. Diabetes Metab. Disord., vol. 21, no. 1, pp. 251–261, Jan. 2022, doi: 10.1007/s40200-021-00968-z.

S. Qin et al., “Machine learning classifiers for screening nonalcoholic fatty liver disease in general adults,” Sci. Rep., vol. 13, no. 1, pp. 1–7, 2023, doi: 10.1038/s41598-023-30750-5.

K. Stefanus and H. Leong, “Comparison of Random Forest Algorithm Accuracy With Xgboost Using Hyperparameters,” Proxies J. Inform., vol. 7, no. 1, pp. 15–23, 2024, doi: 10.24167/proxies.v7i1.12464.

D. Baby, S. J. Devaraj, J. Hemanth, and M. M. Anishin Raj, “Leukocyte classification based on feature selection using extra trees classifier: A transfer learning approach,” Turkish J. Electr. Eng. Comput. Sci., vol. 29, no. 8, pp. 2742–2757, 2021, doi: 10.3906/elk-2104-183.

A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C. Iwendi, “Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, 2023, doi: 10.3390/biomedicines11020581.

F. O. Bulucu, I. Acer, F. Latıfoğlu, and S. Içer, “Predicting liver disease using decision tree ensemble methods,” Erciyes Üniversitesi Fen Bilim. Enstitüsü Fen Bilim. Derg., vol. 38, no. 2, pp. 261–267, 2022.

V. R. Joseph, “Optimal ratio for data splitting,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.

A. A. Kurniawan and M. Mustikasari, “Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia,” J. Inform. Univ. Pamulang, vol. 5, no. 4, p. 544, 2021, doi: 10.32493/informatika.v5i4.6760.

R. Atiq, F. Fariha, M. Mahmud, S. S. Yeamin, K. I. Rushee, and S. Rahim, “A Comparison of Missing Value Imputation Techniques on Coupon Acceptance Prediction,” Int. J. Inf. Technol. Comput. Sci., vol. 14, no. 5, pp. 15–25, 2022, doi: 10.5815/ijitcs.2022.05.02.

I. Huda, “Implementasi Natural Language Processing (Nlp) Untuk Aplikasi Pencarian Lokasi,” J. Nas. Teknol. Terap., vol. 3, no. 2, p. 15, 2021, doi: 10.22146/jntt.35036.

F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over ‑ sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med. Inform. Decis. Mak., vol. 2, pp. 1–14, 2022, doi: 10.1186/s12911-022-02075-2.

A. Özdemir, K. Polat, and A. Alhudhaif, “Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods,” Expert Syst. Appl., vol. 178, no. April, p. 114986, Sep. 2021, doi: 10.1016/j.eswa.2021.114986.

A. R. B. Alamsyah, S. R. Anisa, N. S. Belinda, and A. Setiawan, “SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data,” Proc. Int. Conf. Data Sci. Off. Stat., vol. 2021, no. 1, pp. 305–314, 2022, doi: 10.34123/icdsos.v2021i1.240.

H. M. Qasim, O. Ata, M. A. Ansari, M. N. Alomary, S. Alghamdi, and M. Almehmadi, “Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem,” Medicina (B. Aires)., vol. 57, no. 11, p. 1217, Nov. 2021, doi: 10.3390/medicina57111217.

D. Salirawati, “Identifikasi Problematika Evaluasi Pendidikan Karakter di Sekolah,” J. Sains dan Edukasi Sains, vol. 4, no. 1, pp. 17–27, 2021, doi: 10.24246/juses.v4i1p17-27.

K. M. Elistiana, B. A. Kusuma, P. Subarkah, and H. A. A. Rozaq, “Improvement of Naive Bayes Algorithm in Sentiment Analysis of Shopee Application Reviews on Google Play Store,” J. Tek. Inform., vol. 4, no. 6, pp. 1431–1436, Dec. 2023, doi: 10.52436/1.jutif.2023.4.6.1486.

W. Nengsih, “Analisa Akurasi Permodelan Supervised Dan Unsupervised Learning Menggunakan Data Mining,” Sebatik, vol. 23, no. 2, pp. 285–291, 2019, doi: 10.46984/sebatik.v23i2.771.

Implementation of Extra Trees Classifier and Chi-Square Feature Selection for Early Detection of Liver Disease

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

sidebar

Information