Performance Comparison of AdaBoost, LightGBM, and CatBoost for Parkinson's Disease Classification Using ADASYN Balancing

Muhammad Ridha Anshari; Triando Hamonangan Saragih; Muliadi Muliadi; Dwi Kartini; Fatma Indriani; Hasri Akbar Awal Rozaq; Oktay Yıldız

doi:10.52436/1.jutif.2025.6.5.4726

Authors

Muhammad Ridha Anshari Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Kalimantan, Indonesia
Triando Hamonangan Saragih Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Kalimantan, Indonesia
Muliadi Muliadi Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Kalimantan, Indonesia
Dwi Kartini Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Kalimantan, Indonesia
Fatma Indriani Faculty of Mathematics and Natural Science, Department of Computer Science, Lambung Mangkurat University, Kalimantan, Indonesia
Hasri Akbar Awal Rozaq Graduate School of Informatics, Department of Computer Science, Gazi University, Ankara, Türkiye
Oktay Yıldız Faculty of Engineering, Department of Computer Engineering, Gazi University, Ankara, Türkiye

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.4726

Keywords:

ADASYN, AdaBoost, CatBoost, LightGBM, Parkinson's Disease

Abstract

Parkinson's disease is a neurodegenerative condition identified by the decline of neurons that produce dopamine, causing motor symptoms such as tremors and muscle stiffness. Early diagnosis is challenging as there is no definitive laboratory test. This study aims to improve the accuracy of Parkinson's diagnosis using voice recordings with machine learning algorithms, such as AdaBoost, LightGBM, and CatBoost. The dataset used is Parkinson's Disease Detection from Kaggle, consisting of 195 records with 22 attributes. The data was normalized with Min-Max normalization, and class imbalance was resolved with ADASYN. Results show that ADASYN-LightGBM and ADASYN-CatBoost have the best performance with 96.92% accuracy, 97.10% precision, 96.92% recall, and 96.92% F1 score. This improvement suggests that combining boosting methods and data balancing techniques can improve the accuracy of Parkinson's diagnosis. These results demonstrate the effectiveness of ADASYN in addressing data imbalance and improving the performance of boosting algorithms for medical classification problems. The findings contribute to the development of intelligent diagnostic systems in the field of medical informatics and computer science. These findings are essential for developing more accurate and efficient diagnostic tools, supporting early diagnosis and better management of Parkinson's disease.

Downloads

Download data is not yet available.

References

M. Hirano, M. Samukawa, C. Isono, S. Kusunoki, and Y. Nagai, “The effect of rasagiline on swallowing function in Parkinson’s disease,” Heliyon, vol. 10, no. 1, p. e23407, Jan. 2024, doi: 10.1016/j.heliyon.2023.e23407.

R. K. Sharma and A. K. Gupta, “Voice Analysis for Telediagnosis of Parkinson Disease Using Artificial Neural Networks and Support Vector Machines,” Int. J. Intell. Syst. Appl., vol. 7, no. 6, pp. 41–47, May 2015, doi: 10.5815/ijisa.2015.06.04.

S. J. Chia, E.-K. Tan, and Y.-X. Chao, “Historical Perspective: Models of Parkinson’s Disease,” Int. J. Mol. Sci., vol. 21, no. 7, p. 2464, Apr. 2020, doi: 10.3390/ijms21072464.

C. Taleb, M. Khachab, C. Mokbel, and L. Likforman-Sulem, “A Reliable Method to Predict Parkinson’s Disease Stage and Progression based on Handwriting and Re-sampling Approaches,” in 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), Mar. 2018, pp. 7–12, doi: 10.1109/ASAR.2018.8480209.

H. M. Qasim, O. Ata, M. A. Ansari, M. N. Alomary, S. Alghamdi, and M. Almehmadi, “Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem,” Medicina (B. Aires)., vol. 57, no. 11, p. 1217, Nov. 2021, doi: 10.3390/medicina57111217.

Q. Huang et al., “Identification and validation of senescence-related genes in Parkinson’s disease,” Hum. Gene, vol. 39, p. 201258, Feb. 2024, doi: 10.1016/j.humgen.2024.201258.

S. Rahman, M. Irfan, M. Raza, K. Moyeezullah Ghori, S. Yaqoob, and M. Awais, “Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living,” Int. J. Environ. Res. Public Health, vol. 17, no. 3, p. 1082, Feb. 2020, doi: 10.3390/ijerph17031082.

H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, “Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation,” Remote Sens., vol. 13, no. 21, p. 4405, Nov. 2021, doi: 10.3390/rs13214405.

S. Basha, D. Rajput, and V. Vandhan, “Impact of Gradient Ascent and Boosting Algorithm in Classification,” Int. J. Intell. Eng. Syst., vol. 11, no. 1, pp. 41–49, Feb. 2018, doi: 10.22266/ijies2018.0228.05.

X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, p. 022022, Feb. 2019, doi: 10.1088/1742-6596/1168/2/022022.

S. Wang and Q. Chen, “The Study of Multiple Classes Boosting Classification Method Based on Local Similarity,” Algorithms, vol. 14, no. 2, p. 37, Jan. 2021, doi: 10.3390/a14020037.

A. Y. Aravkin, G. Bottegal, and G. Pillonetto, “Boosting as a kernel-based method,” Mach. Learn., vol. 108, no. 11, pp. 1951–1974, Nov. 2019, doi: 10.1007/s10994-019-05797-z.

H. EL Hamdaoui, S. Boujraf, N. E. H. Chaoui, B. Alami, and M. Maaroufi, “Improving Heart Disease Prediction Using Random Forest and AdaBoost Algorithms,” Int. J. Online Biomed. Eng., vol. 17, no. 11, p. 60, Nov. 2021, doi: 10.3991/ijoe.v17i11.24781.

S. M. Ganie, P. K. D. Pramanik, M. Bashir Malik, S. Mallik, and H. Qin, “An ensemble learning approach for diabetes prediction using boosting techniques,” Front. Genet., vol. 14, Oct. 2023, doi: 10.3389/fgene.2023.1252159.

S. Aymaz, “Boosting medical diagnostics with a novel gradient-based sample selection method,” Comput. Biol. Med., vol. 182, p. 109165, Nov. 2024, doi: 10.1016/j.compbiomed.2024.109165.

D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM),” Diagnostics, vol. 11, no. 9, p. 1714, Sep. 2021, doi: 10.3390/diagnostics11091714.

J. Yan et al., “LightGBM: accelerated genomically designed crop breeding through ensemble learning,” Genome Biol., vol. 22, no. 1, p. 271, Dec. 2021, doi: 10.1186/s13059-021-02492-y.

A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” arXiv Prepr., pp. 1–7, 2018, [Online]. Available: http://arxiv.org/abs/1810.11363.

T. M. Khan, S. Xu, Z. G. Khan, and M. Uzair chishti, “Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System,” J. Healthc. Eng., vol. 2021, pp. 1–15, Mar. 2021, doi: 10.1155/2021/5577636.

M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, p. 87, Jun. 2024, doi: 10.1186/s40537-024-00943-4.

B. A. Omodunbi et al., “Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features,” Diagnostics, vol. 15, no. 12, p. 1467, Jun. 2025, doi: 10.3390/diagnostics15121467.

X. Zhang et al., “An accurate diagnosis of coronary heart disease by Catboost, with easily accessible data,” J. Phys. Conf. Ser., vol. 1955, no. 1, p. 012027, Jun. 2021, doi: 10.1088/1742-6596/1955/1/012027.

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: an experimental review,” J. Big Data, vol. 7, no. 1, p. 70, Dec. 2020, doi: 10.1186/s40537-020-00349-y.

J. Liu, Y. Gao, and F. Hu, “A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM,” Comput. Secur., vol. 106, p. 102289, Jul. 2021, doi: 10.1016/j.cose.2021.102289.

Y. Cao, X. Zhao, Z. Zhou, Y. Chen, X. Liu, and Y. Lang, “MIAC: Mutual-Information Classifier with ADASYN for Imbalanced Classification,” in 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Dec. 2018, pp. 494–498, doi: 10.1109/SPAC46244.2018.8965597.

M. Abdel-Basset, D. El-Shahat, I. El-henawy, V. H. C. de Albuquerque, and S. Mirjalili, “A New Fusion of Grey Wolf Optimizer Algorithm with a Two-Phase Mutation for Feature Selection,” Expert Syst. Appl., vol. 139, no. 112824, pp. 0957–4174, 2020, doi: 10.1016/j.eswa.2019.112824.

D. Dotcom, “Parkinson Disease Dataset,” Kaggle. https://www.kaggle.com/datasets/debasisdotcom/parkinson-disease-detection/data (accessed May 01, 2024).

A. N. Ahmed et al., “Machine learning methods for better water quality prediction,” J. Hydrol., vol. 578, p. 124084, Nov. 2019, doi: 10.1016/j.jhydrol.2019.124084.

M. Mahmud et al., “Implementation of C5.0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 116–124, Mar. 2024, doi: 10.35882/jeeemi.v6i2.384.

P. Chen, “Effects of normalization on the entropy-based TOPSIS method,” Expert Syst. Appl., vol. 136, pp. 33–41, Dec. 2019, doi: 10.1016/j.eswa.2019.06.035.

I. Castiglioni et al., “AI applications to medical images: From machine learning to deep learning,” Phys. Medica, vol. 83, pp. 9–24, Mar. 2021, doi: 10.1016/j.ejmp.2021.02.006.

F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, Mar. 2020, doi: 10.1016/j.ins.2019.11.004.

S. Dhanka and S. Maini, “Random Forest for Heart Disease Detection: A Classification Approach,” in 2021 IEEE 2nd International Conference On Electrical Power and Energy Systems (ICEPES), Dec. 2021, pp. 1–3, doi: 10.1109/ICEPES52894.2021.9699506.

H. Kok, M. S. Izgi, and A. M. Acilar, “Evaluation of the Artificial Neural Network and Naive Bayes Models Trained with Vertebra Ratios for Growth and Development Determination,” Turkish J. Orthod., vol. 34, no. 1, pp. 2–9, Mar. 2021, doi: 10.5152/TurkJOrthod.2020.20059.

L. Wang, M. Han, X. Li, N. Zhang, and H. Cheng, “Review of Classification Methods on Unbalanced Data Sets,” IEEE Access, vol. 9, pp. 64606–64628, 2021, doi: 10.1109/ACCESS.2021.3074243.

C. Esposito, G. A. Landrum, N. Schneider, N. Stiefl, and S. Riniker, “GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning,” J. Chem. Inf. Model., vol. 61, no. 6, pp. 2623–2640, Jun. 2021, doi: 10.1021/acs.jcim.1c00160.

M. Zakariah, S. A. AlQahtani, and M. S. Al-Rakhami, “Machine Learning-Based Adaptive Synthetic Sampling Technique for Intrusion Detection,” Appl. Sci., vol. 13, no. 11, p. 6504, May 2023, doi: 10.3390/app13116504.

Y. F. Zamzam, T. H. Saragih, R. Herteno, Muliadi, D. T. Nugrahadi, and P.-H. Huynh, “Comparison of CatBoost and Random Forest Methods for Lung Cancer Classification using Hyperparameter Tuning Bayesian Optimization-based,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 125–136, Mar. 2024, doi: 10.35882/jeeemi.v6i2.382.

I. Tahyudin, H. A. A. Rozaq, and H. Nambo, “Machine Learning Analysis for Temperature Classification using Bioelectric Potential of Plant,” in 2022 6th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Dec. 2022, pp. 465–470, doi: 10.1109/ICITISEE57756.2022.10057768.

S. Napi’ah, T. H. Saragih, D. T. Nugrahadi, D. Kartini, and F. Abadi, “Implementation of Monarch Butterfly Optimization for Feature Selection in Coronary Artery Disease Classification Using Gradient Boosting Decision Tree,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, Oct. 2023, doi: 10.35882/jeeemi.v5i4.331.

Z. Karapinar Senturk, “Early diagnosis of Parkinson’s disease using machine learning algorithms,” Med. Hypotheses, vol. 138, p. 109603, May 2020, doi: 10.1016/j.mehy.2020.109603.

V. J. Kadam and S. M. Jadhav, “Feature Ensemble Learning Based on Sparse Autoencoders for Diagnosis of Parkinson’s Disease,” 2019, pp. 567–581.

A. H. Al-Fatlawi, M. H. Jabardi, and S. H. Ling, “Efficient diagnosis system for Parkinson’s disease using deep belief network,” in 2016 IEEE Congress on Evolutionary Computation (CEC), Jul. 2016, pp. 1324–1330, doi: 10.1109/CEC.2016.7743941.

A. Benba, A. Jilbab, and A. Hammouch, “Using Human Factor Cepstral Coefficient on Multiple Types of Voice Recordings for Detecting Patients with Parkinson’s Disease,” IRBM, vol. 38, no. 6, pp. 346–351, Nov. 2017, doi: 10.1016/j.irbm.2017.10.002.

Performance Comparison of AdaBoost, LightGBM, and CatBoost for Parkinson's Disease Classification Using ADASYN Balancing

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

sidebar

Information