• Irfan Soliani Sistem Informasi, Fakultas Teknologi Informasi, Universitas Budi Luhur, DKI Jakarta, Indonesia
  • Safitri Juanita Sistem Informasi, Fakultas Teknologi Informasi, Universitas Budi Luhur, DKI Jakarta, Indonesia
Keywords: CRISP-DM, Davies-Bouldin Index, K-Means Algorithm, Prevalence of Disease


In 2019, the World Health Organization (WHO) stated that the top 10 types of diseases accounted for 55% of the 55.4 million deaths in the world. Meanwhile, in Indonesia, the province of West Java has the largest population, with the capital city of Bandung. Based on the health profile of the Bandung City Hospital, there were the ten highest diseases based on 18,147 cases. However, the data has not been processed into helpful information for the health department, especially the city of Bandung, to help determine disease cases by age group. So that the contribution of this study is to classify the prevalence of disease cases by age in Bandung City Hospital; this study aims to help the Bandung City Health Office take preventive, treatment and counselling actions against diseases that have a prevalence of disease cases based on age. This study uses the CRISP-DM methodology, with the K-Means clustering method and the testing method using the elbow method and the Davies-Bouldin Index (DBI). Data processing using rapid miner software and python programming. This study concludes that the optimal cluster value is K=2. The value of cluster 0 consists of the type of disease with the lowest case, and cluster 1 consists of the kind of disease with the highest case. Cluster 1 is the elderly and adult age group, while the age group in cluster 0 is the infant age group, the toddler age group, and the child age group.


Download data is not yet available.


World Health Organization, “The top 10 causes of death,”, 2020. (accessed Jun. 20, 2022).

Bambang Purwanto, “Masalah dan Tantangan Kesehatan Indonesia Saat Ini,”, 2022. (accessed Jun. 20, 2022).

E. Rahmawaty, S. Handayani, M. H. N. Sari, and I. Rahmawati, “Sosialisasi Dan Harmonisasi Gerakan Masyarakat Hidup Sehat (Germas) Dan Program Indonesia Sehat Dengan Pendekatan Keluarga (Pis-Pk) Di Kota Sukabumi,” Link, vol. 15, no. 1, p. 27, 2019, doi: 10.31983/link.v15i1.4385.

Dinas Kesehatan Kota Bandung, PROFIL KESEHATAN KOTA BANDUNG TAHUN 2019. 2019. [Online]. Available:

C. Yuan and H. Yang, “Research on K-Value Selection Method of K-Means Clustering Algorithm,” J Multidiscip. Sci. J., vol. 2, no. 2, pp. 226–235, 2019, doi: 10.3390/j2020016.

K. P. Sinaga and M. Yang, “Unsupervised K-Means Clustering Algorithm,” IEEE Access, vol. 8, pp. 1–12, 2020.

A. Bastian, H. Sujadi, and G. Febrianto, “Penerapan Algoritma K-Means Clustering Analysis Pada Penyakit Menular Manusia (Studi Kasus Kabupaten Majalengka),” J. Sist. Inf., vol. 14, no. 1, pp. 26–32, 2018.

N. Purba, P. Poningsih, and H. S. Tambunan, “Penerapan Algoritma K-Means Clustering Pada Penyebaran Penyakit Infeksi Saluran Pernapasan Akut (ISPA) di Provinsi Riau,” J. Inf. Syst. Res., vol. 2, no. 3, pp. 220–226, 2021, [Online]. Available:

I. N. M. Adiputra, “Clustering Penyakit Dbd Pada Rumah Sakit Dharma Kerti Menggunakan Algoritma K-Means,” Inser. Inf. Syst. Emerg. Technol. J., vol. 2, no. 2, p. 99, 2022, doi: 10.23887/insert.v2i2.41673.

D. S. Saputri, G. M. Putra, and M. F. Larasati, “Implementation of the K-Means Clustering Algorithm for the Covid-19 Vaccinated Village in the Ujung Padang Sub-District Implementasi Algoritma K-Means Clustering Untuk Desa Tervaksinasi Covid-19 Pada Kecamatan Ujung Padang,” J. Tek. Inform., vol. 3, no. 2, pp. 261–267, 2022.

A. S. Osman, “Data mining techniques: Review,” Int. J. Data Sci. Res., vol. 2, no. 1, pp. 1–4, 2019.

L. Kovács and H. Ghous, “Efficiency comparison of Python and RapidMiner,” Multidiszcip. Tudományok, vol. 10, no. 3, pp. 212–220, 2020, doi: 10.35925/j.multi.2020.3.26.

E. Ramadhanta Machmud Razaq, J. Deden Witarsyah, and F. Hamami, “Analisis Sentimen Kepuasan Mahasiswa Terhadap Pembelajaran Online Selama Pandemi Covid-19 Pada Media Sosial Twitter Menggunakan Perbandingan Algoritma Klasifikasi,” in e-Proceeding of Engineering, 2021, vol. 8, no. 5, pp. 9000–9006.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, no. 2019, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.

R. Gustriansyah, N. Suhandi, and F. Antony, “Clustering optimization in RFM analysis based on k-means,” Indones. J. Electr. Eng. Comput. Sci., vol. 18, no. 1, pp. 470–477, 2019, doi: 10.11591/ijeecs.v18.i1.pp470-477.

E. Umargono, J. E. Suseno, and V. G. S. K., “K-Means Clustering Optimization using the Elbow Method and Early Centroid Determination Based-on Mean and Median,” in Proceedings of the International Conferences on Information System and Technology, 2020, no. Conrist 2019, pp. 234–240. doi: 10.5220/0009908402340240.

A. K. Singh, S. Mittal, P. Malhotra, and Y. V. Srivastava, “Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means,” in Proceedings of the 4th International Conference on Computing Methodologies and Communication, ICCMC 2020, 2020, no. Iccmc, pp. 306–310. doi: 10.1109/ICCMC48092.2020.ICCMC-00057.

How to Cite
I. Soliani and S. Juanita, “GROUPING THE PREVALENCE OF DISEASE CASES BY AGE IN BANDUNG CITY HOSPITALS USING K-MEANS”, J. Tek. Inform. (JUTIF), vol. 3, no. 6, pp. 1647-1654, Dec. 2022.