IMPLEMENTATION OF DIABETES PREDICTION MODEL USING RANDOM FOREST ALGORITHM, K-NEAREST NEIGHBOR, AND LOGISTIC REGRESSION

  • Rio Pratama Informatics Engineering, Faculty of computer science, Universitas Buana Perjuangan Karawang, Indonesia
  • Amril Departement Of Informatics, Faculty Of Computer Science, Buana Perjuangan Karawang University, Indonesia
  • Santi Arum Puspita Lestari Departement Of Informatics, Faculty Of Computer Science, Buana Perjuangan Karawang University, Indonesia
  • Sutan Faisal Departement Of Informatics, Faculty Of Computer Science, Buana Perjuangan Karawang University, Indonesia
Keywords: Diabetes, KNN, Random Forest, Logistic Regression

Abstract

Diabetes is a serious metabolic disease that can cause various health complications. With more than 537 million people worldwide living with diabetes in 2021, early detection is crucial to preventing further complications. This research aims to predict the risk of diabetes using machine learning algorithms, namely Random Forest (RF), K-Nearest Neighbor (KNN), and Logistic Regression (LR), with the diabetes dataset from UCI. Previous research has explored a variety of algorithms and techniques, with results varying in accuracy. This research uses a dataset from Kaggle which consists of 768 data with 8 parameters, which are processed through pre-processing and data normalization techniques. The model was evaluated using metrics such as accuracy, confusion matrix, and ROC-AUC. The results showed that Logistic Regression had the best performance with 77% accuracy and AUC 0.83, compared to KNN (75% accuracy, AUC 0.81) and Random Forest ( 74% accuracy, AUC 0.81). These findings emphasize the importance of appropriate algorithm selection and good data pre-processing in diabetes risk prediction. This study concludes that Logistic Regression is the most effective method for predicting diabetes risk in the dataset used.

Downloads

Download data is not yet available.

References

A. M. Argina, “Indonesian Journal of Data and Science Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” vol. 1, no. 2, pp. 29–33, 2020.

Dr. Made Ratna Saraswati, “Diabetes Melitus Adalah Masalah Kita,” yankes.kemkes.go.id, 2022.

I. And and D. Expert, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM) INFORMASI ARTIKEL ABSTRAK,” 2022. [Online]. Available: https://e-journal.unper.ac.id/index.php/informatics

M. A. S, “Prediksi Terkena Diabetes menggunakan Metode K-Nearest Neighbor (KNN) pada Dataset UCI Machine Learning Diabetes,” Indonesian Journal of Applied Mathematics, vol. 3, no. 2, p. 15, Nov. 2023, doi: 10.35472/indojam.v3i2.1577.

N. Marito Putry and B. Nurina Sari, “Komparasi Algoritma Knn Dan Naïve Bayes Untuk Klasifikasi Diagnosis Penyakit Diabetes Melitus,” Jurnal Sains dan Manajemen, vol. 10, no. 1, 2022.

Gde Agung Brahmana Suryanegara, Adiwijaya, and Mahendra Dwifebri Purbolaksono, “Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 114–122, Feb. 2021, doi: 10.29207/resti.v5i1.2880.

D. Nasien et al., “Perbandingan Implementasi Machine Learning Menggunakan Metode KNN, Naive Bayes, Dan Logistik Regression Untuk Mengklasifikasi Penyakit Diabetes,” 2024.

F. D. Utari, A. M. Siregar, and D. Wahiddin, “Implementasi Algoritme K-Nearest Neighboar (KNN) untuk Prediksi Hasil Produksi,” vol. 1, no. 1, 2020.

A. Hidayanti, A. M. Siregar, S. A. P. Lestari, and Y. C. Cahyana, “Model Analisis Kasus Covid-19 Di Indonesia Menggunakan Algoritma Regresi Linier Dan Random Forest,” PETIR, vol. 15, no. 1, pp. 91–101, Dec. 2021, doi: 10.33322/petir.v15i1.1487.

Benedictus Mario Wendhi Tranose, “Penerapan Algoritma K-Nearest Neighbor Dengan Pengolahan Citra Digital Untuk Mengidentifikasi Jenis Kayu ,” Scientific Student Journal for Information, Technology and Science, vol. 4, no. 2715–2766, Jul. 2023.

Siti Nurjanah, “Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Klasifikasi Pencemaran Udara Di Kota Jakarta,” Scientific Student Journal for Information, Technology and Science, vol. 1, no. 2715–2766, 2020.

Yana Cahyana and Amril Mutoi Siregar, “Penerapan Algoritma Random Forest Untuk Klasifikasi KAB kota Provinsi Jawa Barat Berdasarkan Pertanian,” Konferensi Nasional Penelitian dan Pengabdian Universitas Buana Perjuangan Karawang, vol. 1, no. 2798–2580, Jul. 2021.

G. Abdurrahman, “Jurnal Sistem dan Teknologi Informasi Klasifikasi Penyakit Diabetes Melitus Menggunakan Adaboost Classifier,” vol. 7, no. 1, 2022, [Online]. Available: http://jurnal.unmuhjember.ac.id/index.php/JUSTINDO

PAVAN KUMAR D, “Pima Indians Diabetes Database,” https://www.kaggle.com/code/mragpavank/pima-indians-diabetes-database.

A. P. Silalahi, G. Simanullang, and M. I. Hutapea, “METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi Supervised Learning Metode K-nearest Neighbor Untuk Prediksi Diabetes Pada Wanita,” vol. 7, no. 1, 2023, doi: 10.46880/jmika.Vol7No1.pp144-149.

H. Rifa, R. Hamonangan, and D. Ade Kurnia, “KOPERTIP: Jurnal Ilmiah Manajemen Informatika dan Komputer Implementasi Algoritma Decision Tree Dalam Klasifikasi Kompetensi Siswa”, [Online]. Available: http://jurnal.kopertipindonesia.or.id/

F. Yulian Pamuji, V. Puspaning Ramadhan, and R. Artikel, “Jurnal Teknologi dan Manajemen Informatika Komparasi Algoritma Random Forest Dan Decision Tree Untuk Memprediksi Keberhasilan Immunotheraphy Info Artikel ABSTRAK,” vol. 7, pp. 46–50, 2021, [Online]. Available: http://http://jurnal.unmer.ac.id/index.php/jtmi

I. Maulana, A. Mutoi Siregar, and A. Fauzi, “Optimization of Machine Learning Model Accuracy for Brain Tumor Classification with Principal Component Analysis,” Jurnal Teknik Informatika (JUTIF), vol. x, No. y, pp. x-y, 2023, doi: 10.52436/jutif.

H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” International Journal of Electrical and Computer Engineering, vol. 11, no. 3, pp. 2407–2413, Jun. 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput Oper Res, vol. 152, Apr. 2023, doi: 10.1016/j.cor.2022.106131.

D. Y. Utami, E. Nurlelah, and F. N. Hasan, “Comparison of Neural Network Algorithms, Naive Bayes and Logistic Regression to predict diabetes,” JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING, vol. 5, no. 1, pp. 53–64, Jul. 2021, doi: 10.31289/jite.v5i1.5201.

Published
2024-09-03
How to Cite
[1]
R. Pratama, A. M. Siregar, S. A. P. Lestari, and S. Faisal, “IMPLEMENTATION OF DIABETES PREDICTION MODEL USING RANDOM FOREST ALGORITHM, K-NEAREST NEIGHBOR, AND LOGISTIC REGRESSION”, J. Tek. Inform. (JUTIF), vol. 5, no. 4, pp. 1165-1174, Sep. 2024.