Web-Based Diabetes Risk Prediction System Using K-NN on Kaggle Early Stage Diabetes Dataset
DOI:
https://doi.org/10.52436/1.jutif.2025.6.5.5277Keywords:
Data Preprocessing, Diabetes prediction, Early Detection, K-Nearest Neighbours, PHP-ML, Web-based systemAbstract
Diabetes mellitus affects approximately 537 million adults globally, and its rising prevalence poses serious health and economic burdens. Early detection is crucial to reduce risks of complications and improve patient outcomes. This study aims to design and implement a web-based diabetes risk prediction system using the K-Nearest Neighbors (K-NN) algorithm to support early detection based on symptoms. The system utilizes the Kaggle Early Stage Diabetes Risk Prediction Dataset containing 520 records with 17 symptom attributes and one class label. Data preprocessing includes converting categorical data into numerical values, discretizing age into predefined ranges, and applying min-max scaling to normalize feature values. K-NN classification was conducted with K values of 1, 3, and 5, using the PHP Machine Learning (PHP-ML) library and MySQL database integration. The system achieved its highest accuracy of 93.46% at K = 1. Manual testing confirmed that the system processes symptom inputs correctly and provides predictions consistent with training data. This web-based tool offers an accessible platform for early diabetes risk screening, supporting self-assessment and triage. It demonstrates that PHP-ML can effectively implement machine learning in a web environment and can be further enhanced through parameter optimization and integration with larger, more diverse datasets to strengthen generalization.
Downloads
References
D. A. Mishra, “Cardiovascular complications of diabetes mellitus,” InnovAiT: Education and inspiration for general practice, vol. 15, no. 6, pp. 354–361, Jun. 2022, doi: https://doi.org/10.1177/17557380221086012
J. Vrindavanam, R. Haarika, S. MG, and K. S. Kumar, “Diabetes Prediction in Teenagers using Machine Learning Algorithms,” https://ieeexplore.ieee.org/document/10112286. Accessed: Mar. 16, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10112286
Y. Yan, T. Wu, M. Zhang, C. Li, Q. Liu, and F. Li, “Prevalence, awareness and control of type 2 diabetes mellitus and risk factors in Chinese elderly population,” National Library of Medicine, vol. 22, no. 1, pp. 1–6, Jul. 2022, doi: https://doi.org/10.1186/s12889-022-13759-9
Novendy, Renaldy, K. S. Najiyah, W. H. Fadhilah, and W. Tan, “Upaya Peningkatan Kewaspadaan Penyakit Diabetes Melitus Melalui Edukasi dan Skrining Gula Dara Sewaktu,” Jurnal Bakti Masyarakat Indonesia, vol. 7, no. 3, pp. 641–648, Feb. 2025, doi: https://doi.org/10.24912/jbmi.v7i3.32515
M. Edmonds, C. Manu, and P. Vas, “The current burden of diabetic foot disease,” J Clin Orthop Trauma, vol. 17, pp. 88–93, Jun. 2021, doi: https://doi.org/10.1016/j.jcot.2021.01.017
M. Jichkar, R. Shende, O. Bonde, P. Agrawal, G. K. Gupta, and A. K. Singh, “Diabetes Prediction Using Machine Learning,” 2024 IEEE Silchar Subsection Conference (SILCON 2024), pp. 1–6, Nov. 2024, doi: https://doi.org/10.1109/SILCON63976.2024.10910769
A. Shah, A. Isath, and W. S. Aronow, “Cardiovascular complications of diabetes,” Expert Rev Endocrinol Metab, vol. 17, no. 5, pp. 383–388, Jul. 2022, doi: https://doi.org/10.1080/17446651.2022.2099838
P. Verma, B. B. Agarwal, and N. Gupta, “Predicting Chronic Kidney Disease and Coronary Heart Disease Risks in NIDDM Patients Using CNN-Adam: A Deep Learning Approach,” Proceedings - 4th International Conference on Technological Advancements in Computational Sciences, ICTACS 2024, pp. 994–1000, 2024, doi: https://doi.org/10.1109/ICTACS62700.2024.10840544
M. Nabiuni, J. Hatam, M. Milanifard, E. Seidkhani, and A. Jahanbakhshi, “Investigation of Types of Neuropathies in the Brain and Nerves,” Eurasian Journal of Chemical, Medicinal and Petroleum Research, vol. 2, no. 5, pp. 1–15, Dec. 2023, doi: https://doi.org/10.5281/ZENODO.8047104
Y. Zheng, “Diabetes Prediction and Analysis based on Ensemble Learning Method,” 2023 3rd International Conference on Electronic Information Engineering and Computer Science, EIECS 2023, pp. 1353–1358, 2023, doi: https://doi.org/10.1109/EIECS59936.2023.10435397
M. Madoń et al., “Classification, Symptoms, Treatment and Preventive Strategies of Diabetes. A guide to the basic,” Journal of Education, Health and Sport, vol. 70, p. 50192, May 2024, doi: https://doi.org/10.12775/JEHS.2024.70.50192
A. Altamimi et al., “An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques,” BMC Med Res Methodol, vol. 24, no. 1, pp. 1–17, Sep. 2024, doi: https://doi.org/10.1186/S12874-024-02324-0/TABLES/9
J. B. Chandra and D. Nasien, “Application Of Machine Learning K-Nearest Neighbour Algorithm To Predict Diabetes,” International Journal of Electrical, Energy and Power System Engineering, vol. 6, no. 2, pp. 134–139, Jun. 2023, doi: https://doi.org/10.31258/IJEEPSE.6.1.117-121
B. V. V. S. Prasad, S. Gupta, N. Borah, R. Dineshkumar, H. K. Lautre, and B. Mouleswararao, “Predicting diabetes with multivariate analysis an innovative KNN-based classifier approach,” Prev Med (Baltim), vol. 174, p. 107619, Sep. 2023, doi: https://doi.org/10.1016/J.YPMED.2023.107619
F. Ruziq, M. R. Wayahdi, and S. H. N. Ginting, “Diabetes Prediction Based on Medical Records (Pima Indians Diabetes Dataset) Using K-NN,” JOURNAL OF SCIENCE AND SOCIAL RESEARCH, vol. 8, no. 3, pp. 3776–3782, Jul. 2025, doi: https://doi.org/10.54314/JSSR.V8I3.2981
M. Zuber and R. Sirdey, “Efficient homomorphic evaluation of k-NN classifiers,” Proceedings on Privacy Enhancing Technologies, vol. 2021, no. 2, pp. 111–129, Apr. 2021, doi: https://doi.org/10.2478/POPETS-2021-0020
I. Iswanto, T. Tulus, and P. Sihombing, “Comparison of Distance Models on K-Nearest Neighbor Algorithm in Stroke Disease Detection,” Applied Technology and Computing Science Journal, vol. 4, no. 1, pp. 63–68, Jun. 2021, doi: https://doi.org/10.33086/ATCSJ.V4I1.2097
A. B. M. Gunawan, Hasmawati, and A. F. Ihsan, “Implementation of the K-Nearest Neighbor Algorithm for Forecasting the Operational Conditions of Natural Gas Pipeline Transmission Networks,” 2024 International Conference on Data Science and Its Applications (ICoDSA), pp. 439–444, Sep. 2024, doi: https://doi.org/10.1109/ICODSA62899.2024.10651862
D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM),” Diagnostics 2021, Vol. 11, Page 1714, vol. 11, no. 9, p. 1714, Sep. 2021, doi: https://doi.org/10.3390/DIAGNOSTICS11091714
C. Rahmad, E. Rohadi, and R. A. Lusiana, “Authenticity of money using the method KNN (K-Nearest Neighbor) and CNN (Convolutional Neural Network),” IOP Conf Ser Mater Sci Eng, vol. 1073, no. 1, p. 012029, Feb. 2021, doi: https://doi.org/10.1088/1757-899X/1073/1/012029
M. R. Wayahdi and F. Ruziq, “KNN and XGBoost Algorithms for Lung Cancer Prediction,” Journal of Science Technology (JoSTec), vol. 4, no. 1, pp. 179–186, Dec. 2022, doi: https://doi.org/10.55299/JOSTEC.V4I1.251
W. A. Purnomo, W. Prima, Yusran, R. Efendi, and Suryadimal, “Analysis and Design of Web-Based Health Service Information Systems (E-Health), in the Industrial Revolution Era 4.0,” J Phys Conf Ser, vol. 1764, no. 1, p. 012067, Feb. 2021, doi: https://doi.org/10.1088/1742-6596/1764/1/012067
A. D. Pujawan, R. A. Rendra, J. Arifin, and C. Agustin, “Design of Information System Vaccination Report Data Logging Web-Based Using Waterfall Method (Case Study at Bandung Health Office),” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 9, no. 1, pp. 110–125, Mar. 2022, doi: https://doi.org/10.35957/JATISI.V9I1.1440
M. R. Wayahdi and F. Ruziq, “Pemodelan Sistem Penerimaan Anggota Baru dengan Unified Modeling Language (UML) (Studi Kasus: Programmer Association of Battuta),” Jurnal Minfo Polgan, vol. 12, no. 1, pp. 1514–1521, Aug. 2023, doi: https://doi.org/10.33395/JMP.V12I1.12870
F. Ruziq, M. R. Wayahdi, and S. H. N. Ginting, “Pengenalan Struktur Website, Tools, dan Karir Web Developer pada Siswa-Siswi SMK Swasta Jambi Medan,” PRAXIS: Jurnal Pengabdian kepada Masyarakat, vol. 2, no. 1, pp. 16–22, Sep. 2023, doi: https://doi.org/10.47776/PRAXIS.V2I1.717
M. R. Wayahdi, F. Ruziq, and S. H. N. Ginting, “Pelatihan Menjadi Backend Developer Dengan Framework Laravel Pada Siswa Dan Siswi SMK Swasta Free Methodist Medan,” Jurnal Pengabdian Masyarakat Nusantara, vol. 6, no. 1, pp. 20–29, Mar. 2024, doi: https://doi.org/10.57214/PENGABMAS.V6I1.472
M. Zwilling, G. Klien, D. Lesjak, Ł. Wiechetek, F. Cetin, and H. N. Basim, “Cyber Security Awareness, Knowledge and Behavior: A Comparative Study,” Journal of Computer Information Systems, vol. 62, no. 1, pp. 82–97, Feb. 2022, doi: https://doi.org/10.1080/08874417.2020.1712269
M. J. H. Faruk, S. Subramanian, H. Shahriar, M. Valero, X. Li, and M. Tasnim, “Software Engineering Process and Methodology in Blockchain-Oriented Software Development: A Systematic Study,” 2022 IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications, SERA 2022, pp. 120–127, 2022, doi: https://doi.org/10.1109/SERA54885.2022.9806817
M. R. Wayahdi and F. Ruziq, “Designing an Used Goods Donation System to Reduce Waste Accumulation Using the WASPAS Method,” Sinkron : jurnal dan penelitian teknik informatika, vol. 8, no. 4, pp. 2325–2334, Oct. 2024, doi: https://doi.org/10.33395/SINKRON.V8I4.14115
M. Awad, “Study on Neural Network Development Tools for Web Applications and an Attempt to Advance PHP in Machine Learning Field,” Authorea Preprints, Jan. 2024, doi: https://doi.org/10.36227/TECHRXIV.170631202.29693430/V1
M. Pagan, M. Zarlis, and A. Candra, “Investigating the impact of data scaling on the k-nearest neighbor algorithm,” Computer Science and Information Technologies, vol. 4, no. 2, pp. 135–142, Jul. 2023, doi: https://doi.org/10.11591/CSIT.V4I2.P135-142
“Distance - PHP-ML - Machine Learning library for PHP.” Accessed: Jul. 31, 2025. [Online]. Available: https://php-ml.readthedocs.io/en/latest/math/distance/
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fahmi Ruziq, M. Rhifky Wayahdi

This work is licensed under a Creative Commons Attribution 4.0 International License.