Improving the Accuracy of Stunting Prediction in Children in Pagar Alam City Using XGBoost Feature Selection and K-Nearest Neighbor Classification

Authors

  • Ferry Putrawansyah Department Sains and Technology, Institut Teknologi Pagar Alam, Indonesia
  • Mohd. Yazid Idris Centre for Advanced Composite Materials (CACM), Universiti Teknologi Malaysia, Malaysia
  • Febriansyah Department Sains and Technology, Institut Teknologi Pagar Alam, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.6.5473

Keywords:

Accuraccy, Enhancing, K-Nearest Neighbor, Prediction, Stunting, XGBoost

Abstract

Stunting remains a major public health concern in Indonesia, including in Pagar Alam City. Early identification of at-risk children is essential to enable timely interventions and reduce long-term developmental consequences. However, predictive models such as K-Nearest Neighbor (K-NN) often experience reduced accuracy when faced with irrelevant features and imbalanced class distributions. This study integrates feature selection using Extreme Gradient Boosting (XGBoost) to enhance the predictive performance of K-NN in assessing stunting risk. Child growth data obtained from local health facilities were analyzed to build an initial baseline model, which exhibited limited accuracy due to excessive attributes and class imbalance. Through feature-importance analysis, XGBoost identified key predictors including sex, age, weight, and height. The optimized dataset was then used to retrain the K-NN model. Evaluation using accuracy, precision, recall, and F1-score demonstrated an improvement in accuracy from 85.63% to 93.72%. Beyond the computational results, this research provides significant contributions to the field of health informatics. The integration of XGBoost and K-NN offers an efficient analytical mechanism suitable for clinical decision support systems, particularly for data-driven screening in primary healthcare settings. The optimized, lightweight model can be embedded into health information systems to support child growth monitoring, strengthen evidence-based policymaking, and assist healthcare workers in targeting interventions more effectively. This approach can be replicated across other regions, supporting nationwide efforts to reduce stunting prevalence.

Downloads

Download data is not yet available.

References

W. Sulaiman, R. D. Purba, and A. Mulyana, "Pemanfaatan Big Data dalam Bidang Kesehatan: Peluang dan Tantangan," 2023.

L. Munira and Badan Kebijakan Pembangunan Kesehatan, "Survei Status Gizi Indonesia (SSGI) 2022," Kementerian Kesehatan Republik Indonesia, 2022.

R. Putri, M. Arief, and S. Saputra, "Analisis Manual Posyandu dalam Deteksi Stunting: Studi Kasus Kota Pagar Alam," 2024.

A. Pebrianti, M. Firdaus, and Y. Saputra, "Implementasi Algoritma K-Nearest Neighbor untuk Prediksi Stunting," Jurnal Teknologi Informasi, vol. 8, no. 1, pp. 23–30, 2024.

M. Islah, T. Kurniawan, and S. Hasan, "Evaluasi Algoritma K-NN pada Dataset Tidak Seimbang," Jurnal Ilmiah Komputasi, vol. 9, no. 2, pp. 45–51, 2024.

R. Mahendra and D. Putra, "Analisis Akurasi Algoritma XGBoost untuk Klasifikasi Dataset Skala Besar," Jurnal Data Mining dan AI, vol. 11, no. 1, pp. 12–19, 2024.

F. Rahmad, T. Hidayat, and W. Wicaksono, "Kombinasi K-Nearest Neighbor (K-NN) dan Relief-F Untuk Meningkatkan Akurasi Pada Klasifikasi Data," Jurnal Ilmu Komputer dan Informatika, vol. 9, no. 2, pp. 88–95, 2021.

I. Rifatama, D. Mahardika, and R. Fajar, "Optimasi Algoritma K-Nearest Neighbor dengan Seleksi Fitur Menggunakan XGBoost," Jurnal Teknologi dan Sistem Informasi, vol. 12, no. 1, pp. 77–84, 2023.

F. Rahmad, T. Hidayat, and W. Wicaksono, "Kombinasi K-Nearest Neighbor (K-NN) dan Relief-F Untuk Meningkatkan Akurasi Pada Klasifikasi Data," Jurnal Ilmu Komputer dan Informatika, vol. 9, no. 2, pp. 88–95, 2021.

I. Rifatama, D. Mahardika, and R. Fajar, "Optimasi Algoritma K-Nearest Neighbor dengan Seleksi Fitur Menggunakan XGBoost," Jurnal Teknologi dan Sistem Informasi, vol. 12, no. 1, pp. 77–84, 2023.

Musthafa et al., "Penerapan Algoritma K-Nearest Neighbor (KNN) Dengan Fitur Relief-F Dalam Penentuan Status Stunting," 2022.

X. Hakim, Ferry and S. Aminah, "Penerapan Algoritma C4.5 Untuk Prediksi Anak Stunting Di Kota Pagar Alam," 2023.

Y. Yuliska and R. Syaliman, "Peningkatan Akurasi K-Nearest Neighbor Pada Data Index Standar Pencemaran Udara Kota Pekanbaru," 2020.

Katharina Oginawati, Sharnella Janet Yapfrine, Nurul Fahimah, Indah Rachmatiah Siti Salami, Septian Hadi Susetyo. "The associations of heavy metals exposure in water sources to the risk of stunting cases." Emerging Contaminants, 2023J. B. M. b. 1. V. L. P. b. K. K. Berny Carrera a 1, "Environmental sustainability: A machine learning approach for cost analysis in plastic recycling classification," Resources, Conservation and Recycling, Vols. Volume 197, October 2023, 107095, 2023

J, Roihan A, Abas Sunarya P, Rafika As. Ijcit (Indonesian Journal On Computer And Information Technology) Utilization of Machine Learning in Various Fields: Review Paper. Vol. 5, Ijcit (Indonesian Journal On Computer And Information Technology). 2019.

Dexu Zou a, Yongjian Xiang b, Tao Zhou b, Qingjun Peng a, Weiju Dai a, Zhihu Hong a, Yong Shi c, Shan Wang a, Jianhua Yin d, Hao Quan b. "Outlier detection and data filling based on KNN and LOF for power transformer operation data classification." Energy Reports Volume 9, Supplement 7 (2023): 698-711

Ali Asgharzad Hamidi, Bill Robertson, Jacek Ilow. "A new approach for ECG artifact detection using fine-KNN classification and wavelet scattering features in vital health applications." Procedia Computer Science, 2023: Volume 224 , Pages 60-67.

Sihombing Pr, Yuliati If. Application of Machine Learning Methods in Classifying the Risk of Low Birth Weight Events in Indonesia. Matrix: Journal of Management, Informatics Engineering and Computer Engineering. 2021 May 30;20(2):417–26.

Ren Y, Wei W, Zhu P, Zhang X, Chen K, Liu Y. Characteristics, Classification And Knn-Based Evaluation Of Paleokarst Carbonate Reservoirs: A Case Study Of Feixianguan Formation In Northeastern Sichuan Basin, China. Energy Geoscience. 2023 Jul;100156.

Soori M, Arezoo B, Dastres R. Machine Learning And Artificial Intelligence In Cnc Machine Tools, A Review. Sustainable Manufacturing And Service Economics. 2023 Jan;100009.

Salim A, Juliandry, Raymond L, Moniaga J V. General pattern recognition using machine learning in the cloud. Procedia Comput Sci. 2023;216:565–70.

Song X, Xie T, Fischer S. Accelerating Knn Search In High Dimensional Datasets On Fpga By Reducing External Memory Access. Future Generation Computer Systems. 2022 Dec 1;137:189–200.

J. Maillo, I. Triguero, and F. Herrera, “A MapReduce-Based k-Nearest Neighbor Approach for Big Data Classification,” in 2015 IEEE Trustcom/BigDataSE/ISPA, 2015, pp. 167-172, doi: 10.1109/trustcom.2015.577.

R. Karsi, M. Zaim, and J. El Alami, “Assessing naive bayes and support vector machine performance in sentiment classification on a big data platform,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 10, no. 4, pp. 990–996, 2021, doi: 10.11591/IJAI.V10.I4.PP990-996

N. Seman and N. A. Razmi, “Machine learning-based technique for big data sentiments extraction,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 9, no. 3, pp. 473–479, 2020, doi: 10.11591/ijai.v9.i3.pp473-479

K. B. Cohen and L. Hunter, “Natural language processing and systems biology,” Artificial Intelligence Methods and Tools for Systems Biology, Computational Biology, Dordrecht: Springer, 2004, vol. 5, pp. 145–173, doi: 10.1007/978-1-4020-5811-0_9.

X. Wang, C. Yang, and R. Guan, “A comparative study for biomedical named entity recognition,” International Journal of Machine Learning and Cybernetics, vol. 9, pp. 373–382, 2018, doi: 10.1007/s13042-015-0426-6

P. D. Soomro, S. Kumar, Banbhrani, A. A. Shaikh, and H. Raj, “Bio-NER: Biomedical Named Entity Recognition using Rule-Based and Statistical Learners,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 8, no. 12, 2017, doi: 10.14569/IJACSA.2017.081220.

M. C. Cariello, A. Lenci, and R. Mitkov, “A Comparison between Named Entity Recognition Models in the Biomedical Domain,” Translation and Interpreting Technology Online, pp. 76–84, 2021, doi: 10.26615/978-954- 452-071-7_009

Wisit L., Sakol U., “Image classification of malaria using hybrid algorithms: convolutional neural network and method to find appropriate K for K-Nearest neighbor,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 1, pp. 382-388, 2019

] Saja T. A., Rafah S. H., Muayad S. C., “EDM Preprocessing and Hybrid Feature Selection for Improving Classification Accuracy,” Journal of Theoretical and Applied Information Technology, vol. 96, no 1, no. 1992-8645, 2019.

Salal Y. K., Abdullaev S. M., Kumar M., “Educational Data Mining: Student Performance Prediction in Academic,” vol. 8, no. 4C, pp. 54-59, 2019

J. Zhou, J. Li, C. Wang, H. Wu, C. Zhao, and Q. Wang, “A vegetable disease recognition model for complex background based on region proposal and progressive learning,” Computers and Electronics in Agriculture, vol. 184, 2021, doi: 10.1016/j.compag.2021.106101

W. -P. Cao et al., “An ensemble fuzziness-based online sequential learning approach and its application,” International Conference on Knowledge Science, Engineering and Management, 2021, pp. 255–267, doi: 10.1007/978-3-030-82136-4_21.

S. Atsawaraungsuk, T. Katanyukul, and P. Polpinit, “Identity activation structural tolerance online sequential circular extreme learning machine for highly dimensional data,” Engineering and Applied Science Research, vol. 46, no. 2, pp. 120–129, 2019, doi: 10.14456/easr.2019.15

R. Venkatesan and M. J. Er, “A novel progressive learning technique for multi-class classification,” Neurocomputing, vol. 207, pp. 310–321, 2016, doi: 10.48550/arXiv.1609.00085.

M. M. S. m. Mir Mikael Fatemi, "Classification of SSVEP signals using the combined FoCCA-KNN method and comparison with other machine learning methods," Biomedical Signal Processing and Control, Vols. Volume 85, August 2023, 104957, 2023.

Yang Ren, Wei Wei, Peng Zhu, Xiuming Zhang, Keyong Chen, Yisheng Liu. "Characteristics, classification and KNN-based evaluation of paleokarst carbonate reservoirs: A case study of Feixianguan Formation in northeastern Sichuan Basin, China." Energy Geoscience, 2023: Volume 4, Issue 3, July 2023, 100156.

Massami Denis Rukiko a, Adam Ben Swebe Mwakalobo b, Joel Johnson Mmasa. "The impact of Conditional Cash Transfer program on stunting in under five year's poor children." Public Health in Practice Volume 6, December 2023, 100437 (2023).

Nimish Sharma, Shruti Shastri, Siddharth Shastri. "Does urbanization level and types of urban settlements matter for child stunting prevalence in India? Empirical evidence based on nighttime lights data." Cities Volume 140, September 2023, 104388 (2023).

Zin Wai Htay a, Thinzar Swe b, Thae Su Su Hninn c, Maw Thoe Myar d, Kyi Mar Wai. "Factors associated with syndemic anemia and stunting among children in Myanmar: A cross-sectional study from a positive deviance approach." Archives de Pédiatrie Volume 30, Issue 6, August 2023 (2023).

Additional Files

Published

2026-01-05

How to Cite

[1]
F. . Putrawansyah, M. Y. . Idris, and F. Febriansyah, “Improving the Accuracy of Stunting Prediction in Children in Pagar Alam City Using XGBoost Feature Selection and K-Nearest Neighbor Classification”, J. Tek. Inform. (JUTIF), vol. 6, no. 6, pp. 5882–5898, Jan. 2026.