Performance Comparison Of K-Nearest Neighbors And Decision Tree Algorithms With Random Oversampling For Imbalanced Heart Disease Classification

Authors

  • Dita Yustianisa Informatics, Universitas Sulawesi Barat, Indonesia
  • Farid Wajidi Informatics, Universitas Sulawesi Barat, Indonesia
  • Wawan Firgiawan Informatics, Universitas Sulawesi Barat, Indonesia
  • Adinda Gama Sholeha Computer Science, Albukhary International University, Malaysia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.3.5626

Keywords:

Classification, Data Mining, Decision Tree, Heart Disease, K-Nearest Neighbors, Random Oversampling

Abstract

Heart disease remains one of the leading causes of mortality worldwide, including in Indonesia, where delayed detection continues to be a serious challenge. In medical data mining, class imbalance often degrades classification performance by reducing sensitivity toward minority class cases. This study aims to compare the performance of the K-Nearest Neighbors (KNN) and Decision Tree algorithms for heart disease classification and to evaluate the effectiveness of random oversampling in handling imbalanced data. This research uses a heart disease dataset consisting of 10,000 medical records obtained from Kaggle. Data preprocessing includes categorical transformation, missing value imputation using KNN Imputer, and Min–Max normalization. Random oversampling is applied to increase minority class representation. Model evaluation is performed using stratified 10-fold cross-validation with accuracy, precision, recall, F1-score, and Receiver Operating Characteristic–Area Under the Curve (ROC–AUC) as performance metrics. Experimental results show that after random oversampling, the KNN model achieves the best performance with an accuracy of 94%, precision of 96%, recall of 90%, F1-score of 92%, and ROC–AUC of 90.2%. In comparison, the Decision Tree model records an accuracy of 80%, precision of 78%, recall of 81%, F1-score of 79%, and ROC–AUC of 81.5%. These findings demonstrate that random oversampling significantly improves minority class detection, particularly for KNN. This study contributes to Informatics by providing empirical evidence that simple and efficient data mining strategies can effectively address class imbalance in large-scale medical datasets, supporting the development of accurate, interpretable, and accessible AI-based diagnostic systems for early heart disease detection.

Downloads

Download data is not yet available.

References

Kemenkes, “Satu dari Tiga Kematian Disebabkan oleh Jantung, Ayo Cegah serangan jantung,” Unit Pelayanan Kesehatan. [Online]. Available: https://upk.kemkes.go.id/new/satu-dari-tiga-kematian-disebabkan-oleh-jantung-ayo-cegah-serangan-jantung

D. P. Utomo, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” J. Media Inform. Budidarma, vol. 4, no. April, pp. 437–444, 2020, doi: 10.30865/mib.v4i2.2080.

M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput. Biol. Med., vol. 136, no. May, p. 104672, 2021, doi: 10.1016/j.compbiomed.2021.104672.

D. Rohmayani, C. A. Sugianto, R. S. Perdana, and M. Mansoor, “Improving Extreme Gradient Boosting Model for Heart Disease Prediction Using SMOTE for Class Imbalance,” J. Tek. Inform., vol. 6, no. 4, pp. 1717–1728, 2025, doi: https://doi.org/10.52436/1.jutif.2025.6.4.4753.

T. S. Sriya, “Heart Disease Prediction Using KNN,” Int. J. Res. Eng. Sci. Manag., vol. 7, no. 6, pp. 156–157, 2024, doi: https://journal.ijresm.com/index.php/ijresm/article/view/3097.

J. D. Muthohhar and A. Prihanto, “Analisis Perbandingan Algoritma Klasifikasi untuk Penyakit Jantung,” J. Informatics Comput. Sci., vol. 04, pp. 298–304, 2023, doi: 10.26740/jinacs.v4n03.p298-304.

A. A. Surya and Y. Yamasari, “Penerapan Algoritma Naïve Bayes (NB) untuk Klasifikasi Penyakit Jantung,” J. Informatics Comput. Sci., vol. 5, no. 03, pp. 447–455, 2024, doi: 10.26740/jinacs.v5n03.p447-455.

A. Sepharni, I. E. Hendrawan, and C. Rozikin, “Klasifikasi Penyakit Jantung dengan Menggunakan Algoritma C4.5,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 7, no. 2, p. 117, 2022, doi: 10.30998/string.v7i2.12012.

A. Yogianto, A. Homaidi, and Z. Fatah, “Implementasi Metode K-Nearest Neighbors (KNN) untuk Klasifikasi Penyakit Jantung,” G-Tech J. Teknol. Terap., vol. 8, no. 3, pp. 1720–1728, 2024, doi: 10.33379/gtech.v8i3.4495.

H. Hidayat, A. Sunyoto, and H. Al Fatta, “Klasifikasi Penyakit Jantung Menggunakan Random Forest Clasifier,” J. SISKOM-KB (Sistem Komput. dan Kecerdasan Buatan), vol. 7, no. 1, pp. 31–40, 2023, doi: 10.47970/siskom-kb.v7i1.464.

S. R. Azizah, R. Herteno, A. Farmadi, D. Kartini, and I. Budiman, “Kombinasi Seleksi Fitur Berbasis Filter Dan Wrapper Combinations Of Feature Selection Based On Filter And Wrapper,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, pp. 1361–1368, 2023, doi: 10.25126/jtiik.2023107467.

P. A. Jusia, A. Rahim, H. Yani, and J. Jasmir, “Improving Performance of KNN and C4.5 using Particle Swarm Optimization in Classification of Heart Disease,” J. Resti, vol. 5, no. 158, pp. 1–6, 2026, doi: 10.29207/resti.v8i3.5710.

E. F. Laili et al., “Komparasi Algoritma Decision Tree Dan Support Vector Machine ( Svm ) Dalam,” J. Sist. Inf. dan Inform., vol. 8, no. 1, pp. 67–76, 2025, doi: https://doi.org/10.47080/simika.v8i1.3683.

I. W. Gamadarenda, I. Waspada, U. Diponegoro, P. Korespondensi, S. Atribut, and A. Backward, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis ( Pgk ) Menggunakan K-Nearest Neighbor ( Knn ) Dengan Backward Data Mining Implementation For Detection Of Chronic Kidney ( Ckd ) Using K-Nearest Neighbor ( Knn ) With Backward Elimination,” J. Teknol. dan ilmu Komput., vol. 7, no. 2, pp. 417–426, 2020, doi: 10.25126/jtiik.202071896.

A. C. Wibowo, S. A. Lestari, S. Informasi, F. I. Komputer, U. Duta, and B. Surakarta, “Analisis Penggunaan Machine Learning Dalam Klasifikasi Penentuan Penyakit Jantung,” J. Sist. Inf. DAN Tek. Komput., vol. 9, no. 2, pp. 9–13, 2024, doi: https://doi.org/10.51876/simtek.v9i2.395.

C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 227–238, 2023, doi: 10.30812/matrik.v22i2.2515.

A. N. Kasanah, Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam,” J. Rekayasa Sist. dan Teknol. Inf., vol. 1, no. 10, 2021, doi: 10.29207/resti.v3i2.945.

R. Amelia, Indahwati, A. Fitrianto, and A. Rizki, “Komparasi Teknik Undersampling Dan Oversampling Pada Regresi,” J. TIMES (Technology Informatics Comput. Syst., vol. X, no. 2, pp. 1–11, 2024, doi: https://doi.org/10.51351/jtm.10.2.2021652.

Z. Abidin, T. Suratno, and M. F. Putri, “Penerapan Random Oversampling Dan Principal Component Analysis Untuk Meningkatkan Akurasi Prediksi Kebangkrutan Application Of Random Oversampling And Principal Component Analysis To Enhance The Accuracy Of Bankruptcy Prediction For,” J. Teknol. Inf. dan Ilmu Komput., vol. 12, no. 5, pp. 1209–1220, 2025, doi: https://doi.org/10.25126/jtiik.2025125.

A. F. Riany and G. Testiana, “Penerapan Data Mining untuk Klasifikasi Penyakit Jantung Koroner Menggunakan Algoritma Naïve Bayes,” MDP Student Conf., vol. 2, no. 1, pp. 297–305, 2023, doi: 10.35957/mdp-sc.v2i1.4388.

M. A. Fais et al., “Implementasi Algoritma Decision Tree untuk Klasifikasi Serangan Jantung,” J. Sist. Inf. dan Ilmu Komput., vol. 1, no. 4, pp. 207–212, 2023, doi: https://doi.org/10.59581/jusiik-widyakarya.v1i4.1895.

A. S. Arifianto, K. D. Safitri, K. Agustianto, and I. G. Wiryawan, “Pengaruh Prediksi Missing Value Pada The Effect Of Missing Value Prediction On,” J. Teknol. Inf. dan Ilmu Komput., vol. 9, no. 4, pp. 779–786, 2022, doi: 10.25126/jtiik.202294778.

Ainurrohmah and D. T. Wijayanti, “Analisis Performa Algoritma Decision Tree , Naïve Bayes , K- Nearest Neighbor Untuk Klasifikasi Zona Daerah Risiko Covid-19 Di Indonesia Performance Analysis Of Decision Tree , Naïve Bayes , K-Nearest Neighbor Algorithm For Covid-19 Risk Zone Classificati,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 1, pp. 115–122, 2023, doi: 10.25126/jtiik.2023105935.

P. V. Bhamare, S. R. Chikhale, N. S. Sawakare, A. Y. Kurkunde, and M. S. Autade, “Heart Disease Prediction Using Machine Learning Algorithms,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. April, pp. 559–564, 2024, doi: https://doi.org/10.22214/ijraset.2022.44895.

A. F. Masruriyah, H. Y. Novita, C. E. Sukmawati, A. R. Ramadhan, S. N. N. Arif, and B. A. Dermawan, “Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung,” Comput. Sci., vol. 4, no. 1, pp. 62–70, 2024, doi: 10.31294/coscience.v4i1.2389.

Yuliana, Robet, and L. Hoki, “Comparative Analysis of XGBoost , KNN , and SVM Algorithms for Heart Disease Prediction Using SMOTE-Tomek Balancing,” vol. 10, no. 1, pp. 305–316, 2026, doi: https://doi.org/10.33395/sinkron.v10i1.15469 e-ISSN.

I. N. Abrar, A. Abdullah, and Sucipto, “Klasifikasi Penyakit Liver Menggunakan Metode Elbow Untuk Menentukan K Optimal pada Algoritma K-Nearest Neighbor ( K-NN ),” J. SISFOKOM (Sistem Inf. dan Komputer), vol. 12, no. 1, pp. 218–228, 2023, doi: 10.32736/sisfokom.v12i2.1643.

B. Aribowo, B. Tjahjono, G. Firmasnyah, and A. M. Widodo, “Prediksi Peringkat Akreditasi BAN PT Program Studi Sarjana Rumpun Ilmu Komputer Menggunakan Klasifikasi Machine Learning,” J. Al-azhar Indones. Seri Sains dan Teknol., vol. 10, no. 2, pp. 122–127, 2025, doi: http://dx.doi.org/10.36722/sst.v10i2.3089.

S. Sudianto, J. A. Masheli, N. Nugroho, R. R. W. Ananda, and A. Zarkasih, “Comparison of Support Vector Machines and K-Nearest Neighbor Algorithm Analysis of Spam Comments on Youtube Covid Omicron,” J. Tek. Inform., vol. 15, no. 2, pp. 110–118, 2022, doi: https://doi.org/10.15408/jti.v15i2.24996.

J. Yos, S. No, K. Lubuklinggau, and S. Selatan, “Perbandingan Tingkat Akurasi Metode KNN Dan Decision Tree Dalam Memprediksi Lama Studi Mahasiswa,” vol. 03, no. 97, pp. 6–14, 2021.

C. A. Sinaga and A. K. Ginting, “Implementasi Algoritma K-Nearest Neighbors Dengan Pendekatan Elbow Method Untuk Klasifikasi Status Ketahanan Pangan Provinsi Di Indonesia,” Kumpul. Artik. Ilm. Fak. Ilmu Komput., vol. 07, no. 01, pp. 27–34, 2025, doi: https://doi.org/10.54367/kakifikom.v7i1.4949.

I. Maulana and R. Roestam, “Optimizing KNN Algorithm Using Elbow Method Predicting Voter Participation Using Fixed Voter List Data (DPT),” J. Sos. dan Teknol., vol. 4, pp. 441–451, 2024, doi: 10.59188/jurnalsostech.v4i7.1308.

Z. Cetinkaya and F. Horasan, “Decision Trees in Large Data Sets,” J. Eng. Res. Dev., vol. 13, pp. 140–151, 2021, doi: 10.29137/umagd.763490.

N. A. Sivi, I. Mualim, and C. A. Lestari, “Data Mining Menggunakan Decision Tree untuk Prediksi Nilai Akhir Siswa,” J. Ilm. Tek. Inform. dan Komun., vol. 4, no. November, pp. 26–36, 2024, doi: https://doi.org/10.55606/juitik.v4i3.1824.

L. Hakim, A. Sobri, L. Sunardi, and D. Nurdiansyah, “Prediksi penyakit jantung berbasis mesin learning dengan menggunakan metode k-nn,” J. Digit. Teknol. Inf., vol. 07, no. 02, pp. 14–20, 2025, doi: https://doi.org/10.32502/digital.v7i2.9429.

Additional Files

Published

2026-06-15

How to Cite

[1]
D. . Yustianisa, F. Wajidi, W. . Firgiawan, and A. G. . Sholeha, “Performance Comparison Of K-Nearest Neighbors And Decision Tree Algorithms With Random Oversampling For Imbalanced Heart Disease Classification”, J. Tek. Inform. (JUTIF), vol. 7, no. 3, pp. 2633–2645, Jun. 2026.