Implementation of Extra Trees Classifier and Chi-Square Feature Selection for Early Detection of Liver Disease

Authors

  • Muhammad Akmal Al Ghifari Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
  • Irwan Budiman Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
  • Triando Hamonangan Saragih Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
  • Muhammad Itqan Mazdadi Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
  • Rudy Herteno Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Indonesia
  • Hasri Akbar Awal Rozaq Graduate School of Informatics, Department of Computer Science, Gazi University, Türkiye

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.4261

Keywords:

Chi-Square, Extra Trees Classifier, Feature Selection, Liver

Abstract

The imbalanced distribution of medical data poses challenges in accurately detecting liver disease, which is crucial as symptoms often remain unnoticed until advanced stages. This study examines the application of the Extra Trees Classifier algorithm and chi-square feature selection for early detection of liver disease. Compared to traditional methods like Random Forest and SVM, the Extra Trees Classifier offers enhanced computational efficiency and better handling of imbalanced datasets, while chi-square feature selection helps identify the most relevant medical indicators. The data consists of five medical variables likely to be laboratory test results from patient samples, with labels indicating classes A and B. The data is randomly divided with a ratio of 80% for each class. To address data imbalance, SMOTE technique was applied before the data was randomly split into a ratio of 80% for training and 20% for testing to ensure effective learning and testing of the model's performance. The results showed that with the help of chi-square feature selection, the Extra Trees Classifier algorithm could provide fairly accurate predictions in liver disease classification, with an accuracy of 82.6%, sensitivity of 85.5%, precision of 78.3%, and F1-Score of 81.7%. These results demonstrate significant improvement over existing methods, and the proposed approach can aid healthcare practitioners in making timely diagnostic decisions, potentially reducing mortality rates through early intervention in liver disease cases.

Downloads

Download data is not yet available.

References

E. Patimah, V. B. Haekal, and D. Sandya Prasvita, “Klasifikasi Penyakit Liver dengan Menggunakan Metode Decision Tree,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, vol. 2, no. 1, pp. 655–659, 2021.

B. N. P. Maharani, A. D. Hendriani, and P. W. P. Iswari, “Liver Cirrhosis: Pathophysiology, Diagnosis, and Management,” J. Biol. Trop., vol. 23, no. 1, pp. 457–463, 2023, doi: 10.29303/jbt.v23i1.5763.

M. Ghosh et al., “A comparative analysis of machine learning algorithms to predict liver disease,” Intell. Autom. Soft Comput., vol. 30, no. 3, pp. 917–928, 2021, doi: 10.32604/iasc.2021.017989.

Z. Guo et al., “A randomized-controlled trial of ischemia-free liver transplantation for end-stage liver disease,” J. Hepatol., vol. 79, no. 2, pp. 394–402, 2023, doi: 10.1016/j.jhep.2023.04.010.

M. E. Rinella et al., AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease, vol. 77, no. 5. 2023. doi: 10.1097/HEP.0000000000000323.

L. Rong, J. Zou, W. Ran, and X. Qi, “Advancements in the treatment of non-alcoholic fatty liver disease ( NAFLD ),” no. January, pp. 1–18, 2023, doi: 10.3389/fendo.2022.1087260.

M. V. Machado, “Aerobic exercise in the management of metabolic dysfunction associated fatty liver disease,” 2021. doi: 10.2147/DMSO.S304357.

A. S. Afrah, “Sistem Diagnosa Penyakit Liver Menggunakan Metode Artificial Neural Network: Studi Berdasarkan Dataset Indian Liver Patient Dataset,” J. Inform. J. Pengemb. IT, vol. 8, no. 3, pp. 308–312, Dec. 2023, doi: 10.30591/jpit.v8i3.5346.

M. T. Long, M. Noureddin, and J. K. Lim, “CLINICAL PRACTICE UPDATE AGA Clinical Practice Update : Diagnosis and Management Expert Review,” Gastroenterology, vol. 163, no. 3, pp. 764-774.e1, 2022, doi: 10.1053/j.gastro.2022.06.023.

D. S. Ali and M. A. Aljabery, “Predicting Liver Cirrhosis Stages Using Extra Trees, Random Forest, and SVM with Data Mining Techniques,” Inform., vol. 48, no. 21, pp. 15–26, 2024, doi: 10.31449/inf.v48i21.6752.

F. Muhammad et al., “Liver Ailment Prediction Using Random Forest Model,” Comput. Mater. Contin., vol. 74, no. 1, pp. 1049–1067, 2023, doi: 10.32604/cmc.2023.032698.

Y. O. Daddala and K. Shaik, “Cardiovascular Disease Prediction: Employing Extra Tree Classifier-Based Feature Selection and Optimized RNN with Artificial Bee Colony,” Rev. d’Intelligence Artif., vol. 38, no. 2, pp. 643–653, Apr. 2024, doi: 10.18280/ria.380228.

Y. Duan et al., “Association of Inflammatory Cytokines With Non-Alcoholic Fatty Liver Disease,” Front. Immunol., vol. 13, no. May, 2022, doi: 10.3389/fimmu.2022.880298.

D. Sharma, R. Kumar, and A. Jain, “Measurement : Sensors Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning,” Meas. Sensors, vol. 24, no. September, p. 100560, 2022, doi: 10.1016/j.measen.2022.100560.

M. Mahmud et al., “Implementation of C5.0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 116–124, Mar. 2024, doi: 10.35882/jeeemi.v6i2.384.

S. M. Ganie, P. K. Dutta Pramanik, and Z. Zhao, “Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, p. 160, Jun. 2024, doi: 10.1186/s12911-024-02550-y.

A. Ahmad, S. Akbar, M. Tahir, M. Hayat, and F. Ali, “iAFPs-EnC-GA: Identifying antifungal peptides using sequential and evolutionary descriptors based multi-information fusion and ensemble learning approach,” Chemom. Intell. Lab. Syst., vol. 222, no. 06, p. 104516, Mar. 2022, doi: 10.1016/j.chemolab.2022.104516.

P. Theerthagiri, “Liver disease classification using histogram-based gradient boosting classification tree with feature selection algorithm,” Biomed. Signal Process. Control, vol. 100, p. 107102, Feb. 2025, doi: 10.1016/j.bspc.2024.107102.

A. Panwar, V. Bhatnagar, M. Khari, A. W. Salehi, and G. Gupta, “A Blockchain Framework to Secure Personal Health Record (PHR) in IBM Cloud-Based Data Lake,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/3045107.

A. S. Abdalrada, J. Abawajy, T. Al-Quraishi, and S. M. S. Islam, “Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: a retrospective cohort study,” J. Diabetes Metab. Disord., vol. 21, no. 1, pp. 251–261, Jan. 2022, doi: 10.1007/s40200-021-00968-z.

S. Qin et al., “Machine learning classifiers for screening nonalcoholic fatty liver disease in general adults,” Sci. Rep., vol. 13, no. 1, pp. 1–7, 2023, doi: 10.1038/s41598-023-30750-5.

K. Stefanus and H. Leong, “Comparison of Random Forest Algorithm Accuracy With Xgboost Using Hyperparameters,” Proxies J. Inform., vol. 7, no. 1, pp. 15–23, 2024, doi: 10.24167/proxies.v7i1.12464.

D. Baby, S. J. Devaraj, J. Hemanth, and M. M. Anishin Raj, “Leukocyte classification based on feature selection using extra trees classifier: A transfer learning approach,” Turkish J. Electr. Eng. Comput. Sci., vol. 29, no. 8, pp. 2742–2757, 2021, doi: 10.3906/elk-2104-183.

A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C. Iwendi, “Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, 2023, doi: 10.3390/biomedicines11020581.

F. ORHANBULUCU, A. İrem, F. LATİFOĞLU, and İ. Semra, “Predicting liver disease using decision tree ensemble methods,” Erciyes Üniversitesi Fen Bilim. Enstitüsü Fen Bilim. Derg., vol. 38, no. 2, pp. 261–267, 2022.

K. R. Makkena and K. Natarajan, “Classification Algorithms for Liver Epidemic Identification,” EAI Endorsed Trans. Pervasive Heal. Technol., vol. 9, pp. 1–13, 2023, doi: 10.4108/eetpht.9.4379.

V. R. Joseph, “Optimal ratio for data splitting,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.

A. A. Kurniawan and M. Mustikasari, “Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia,” J. Inform. Univ. Pamulang, vol. 5, no. 4, p. 544, 2021, doi: 10.32493/informatika.v5i4.6760.

R. Atiq, F. Fariha, M. Mahmud, S. S. Yeamin, K. I. Rushee, and S. Rahim, “A Comparison of Missing Value Imputation Techniques on Coupon Acceptance Prediction,” Int. J. Inf. Technol. Comput. Sci., vol. 14, no. 5, pp. 15–25, 2022, doi: 10.5815/ijitcs.2022.05.02.

I. Huda, “Implementasi Natural Language Processing (Nlp) Untuk Aplikasi Pencarian Lokasi,” J. Nas. Teknol. Terap., vol. 3, no. 2, p. 15, 2021, doi: 10.22146/jntt.35036.

A. Hristov, A. Tahchiev, H. Papazov, N. Tulechki, T. Primov, and S. Boytcheva, “Application of Deep Learning Methods to SNOMED CT Encoding of Clinical Texts: From Data Collection to Extreme Multi-Label Text-Based Classification,” in International Conference Recent Advances in Natural Language Processing, RANLP, 2021, pp. 557–565. doi: 10.26615/978-954-452-072-4_063.

F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over ‑ sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med. Inform. Decis. Mak., vol. 2, pp. 1–14, 2022, doi: 10.1186/s12911-022-02075-2.

A. Özdemir, K. Polat, and A. Alhudhaif, “Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods,” Expert Syst. Appl., vol. 178, no. April, p. 114986, Sep. 2021, doi: 10.1016/j.eswa.2021.114986.

A. R. B. Alamsyah, S. R. Anisa, N. S. Belinda, and A. Setiawan, “SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data,” Proc. Int. Conf. Data Sci. Off. Stat., vol. 2021, no. 1, pp. 305–314, 2022, doi: 10.34123/icdsos.v2021i1.240.

H. M. Qasim, O. Ata, M. A. Ansari, M. N. Alomary, S. Alghamdi, and M. Almehmadi, “Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem,” Medicina (B. Aires)., vol. 57, no. 11, p. 1217, Nov. 2021, doi: 10.3390/medicina57111217.

D. Salirawati, “Identifikasi Problematika Evaluasi Pendidikan Karakter di Sekolah,” J. Sains dan Edukasi Sains, vol. 4, no. 1, pp. 17–27, 2021, doi: 10.24246/juses.v4i1p17-27.

K. M. Elistiana, B. A. Kusuma, P. Subarkah, and H. A. A. Rozaq, “Improvement of Naive Bayes Algorithm in Sentiment Analysis of Shopee Application Reviews on Google Play Store,” J. Tek. Inform., vol. 4, no. 6, pp. 1431–1436, Dec. 2023, doi: 10.52436/1.jutif.2023.4.6.1486.

W. Nengsih, “Analisa Akurasi Permodelan Supervised Dan Unsupervised Learning Menggunakan Data Mining,” Sebatik, vol. 23, no. 2, pp. 285–291, 2019, doi: 10.46984/sebatik.v23i2.771.

P. J. Shetty, “Prediction performance of classification models for imbalanced liver disease data,” vol. 8, no. 5, pp. 58–62, 2023.

R. Ubaidillah, M. Muliadi, D. T. Nugrahadi, M. R. Faisal, and R. Herteno, “Implementasi XGBoost Pada Keseimbangan Liver Patient Dataset dengan SMOTE dan Hyperparameter Tuning Bayesian Search,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 3, p. 1723, Jul. 2022, doi: 10.30865/mib.v6i3.4146.

M. A. Khadija and N. A. Setiawan, “Detecting Liver Disease Diagnosis by Combining SMOTE, Information Gain Attribute Evaluation and Ranker,” ITSMART J. Teknol. dan Inf., vol. 9, no. 1, pp. 13–17, 2020.

W. Hidayat, M. Ardiansyah, and A. Setyanto, “Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb,” Edumatic J. Pendidik. Inform., vol. 5, no. 1, pp. 11–20, 2021, doi: 10.29408/edumatic.v5i1.3125.

Additional Files

Published

2025-10-16

How to Cite

[1]
M. A. Al Ghifari, I. Budiman, T. H. Saragih, M. I. Mazdadi, R. Herteno, and H. A. A. Rozaq, “Implementation of Extra Trees Classifier and Chi-Square Feature Selection for Early Detection of Liver Disease”, J. Tek. Inform. (JUTIF), vol. 6, no. 5, pp. 3405–3418, Oct. 2025.