Geographically Weighted Random Forests for Human Development Index of Central Java Prediction

Authors

  • Shaifudin Zuhdi Department of Informatics, Sebelas Maret University, Indonesia
  • Isna Nurul Fatatik Department of Data Science, Sebelas Maret University, Indonesia
  • Izlah Nur Fadlila Herawati Prihasno Department of Data Science, Sebelas Maret University, Indonesia
  • Hasri Akbar Awal Rozaq Graduate School of Informatics, Department of Computer Science, Gazi University, Ankara, Türkiye

DOI:

https://doi.org/10.52436/1.jutif.2025.6.4.5204

Keywords:

GWR, Human Development Index, Random Forests, Spatial Analysis

Abstract

The geographically weighted regression (GWR) model has been widely used in various types of predictions, including human development index predictions. Similarly, the random forests (RF) model has also been widely used in various value predictions. The GWR model always assumes a local linear relationship between dependent and independent variables. The RF model only produces one global model that cannot represent conditions at each location. The GWR model is susceptible to multicollinearity in each independent variable, which can lead to overfitting if multicollinearity in the model is high. To address the vulnerability of the GWR model to multicollinearity, the RF model and the GWR model can be combined. Since the RF model is not vulnerable to multicollinearity in the independent variables, the modification becomes the geographically weighted random forests (GWRF) model to improve the shortcomings of the GWR and RF models. The GWR and GWRF models were constructed using data from districts and cities in Central Java Province, which was selected as the study area due to evident disparities in human development index achievements. These disparities highlight the presence of spatial heterogeneity that conventional models fail to adequately capture. To rigorously evaluate model performance, data from 2023 were employed as training data, while data from 2024 served as testing data. This research introduces a novel integration of spatial econometric and machine learning approaches, providing a more robust framework for addressing complex spatial variations in human development outcomes. The GWRF model is capable of producing a model that does not overfit when there is multicollinearity among independent variables. The GWRF model offers a novel integration of machine learning and spatial modelling, outperforming both GWR and RF by not only delivering high predictive accuracy under complex variable relationships but also capturing nuanced local spatial heterogeneity that conventional approaches fail to address.

Downloads

Download data is not yet available.

References

X. Lian, Z. Fu, and J. Chen, “Analysis of spatial differences in global regional human development index under planetary pressure and decomposition study of driving factors,” J. Environ. Manage., vol. 348, p. 119292, Dec. 2023, doi: 10.1016/j.jenvman.2023.119292.

F. Y. Meilita and M. I. Hasmarini, “Analysis of Factors Affecting the Human Development Index (HDI) in 43 Sub-Saharan African Countries 2018-2022,” J. Ekon. Balanc., vol. 20, no. 2, pp. 143–152, Dec. 2024, doi: 10.26618/jeb.v20i2.15474.

J. J. E. Ganda and L. Yola, “Spatial Empirical Analysis on Urban Dwellers’ Human Development Index in North Sulawesi, Indonesia,” in Advances in Civil Engineering Materials, 2023, pp. 465–471, doi: 10.1007/978-981-19-8024-4_40.

F. D. A. Putri, S. Suhendro, and P. Nauli, “Analysis of factors affecting the level of the human development index,” Asian J. Econ. Bus. Manag., vol. 1, no. 3, pp. 218–228, Nov. 2022, doi: 10.53402/ajebm.v1i3.229.

M. N. Lessani and Z. Li, “SGWR: similarity and geographically weighted regression,” Int. J. Geogr. Inf. Sci., vol. 38, no. 7, pp. 1232–1255, Jul. 2024, doi: 10.1080/13658816.2024.2342319.

S. Georganos et al., “Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling,” Geocarto Int., vol. 36, no. 2, pp. 121–136, Jan. 2021, doi: 10.1080/10106049.2019.1595177.

S. Georganos and S. Kalogirou, “A Forest of Forests: A Spatially Weighted and Computationally Efficient Formulation of Geographical Random Forests,” ISPRS Int. J. Geo-Information, vol. 11, no. 9, p. 471, Aug. 2022, doi: 10.3390/ijgi11090471.

H. Wiemer, L. Drowatzky, and S. Ihlenfeldt, “Applied Aciences Data Mining Methodology for Engineering Applications ( DMME )— A Holistic Extension,” Appl. Sci., 2019.

G. E. I. Selim, E. E. D. Hemdan, A. M. Shehata, and N. A. El-Fishawy, “Anomaly events classification and detection system in critical industrial internet of things infrastructure using machine learning algorithms,” Multimed. Tools Appl., vol. 80, no. 8, pp. 12619–12640, 2021, doi: 10.1007/s11042-020-10354-1.

H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, “Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation,” Remote Sens., vol. 13, no. 21, p. 4405, Nov. 2021, doi: 10.3390/rs13214405.

E. Y. K. Ng and J. T. Lim, “Machine Learning on Fault Diagnosis in Wind Turbines,” Fluids, vol. 7, no. 12, 2022, doi: 10.3390/fluids7120371.

N. K. A. Appiah-Badu, Y. M. Missah, L. K. Amekudzi, N. Ussiph, T. Frimpong, and E. Ahene, “Rainfall Prediction Using Machine Learning Algorithms for the Various Ecological Zones of Ghana,” IEEE Access, vol. 10, pp. 5069–5082, 2022, doi: 10.1109/ACCESS.2021.3139312.

N. Nurwatik, M. H. Ummah, A. B. Cahyono, M. R. Darminto, and J.-H. Hong, “A Comparison Study of Landslide Susceptibility Spatial Modeling Using Machine Learning,” ISPRS Int. J. Geo-Information, vol. 11, no. 12, p. 602, Dec. 2022, doi: 10.3390/ijgi11120602.

D. Wu, Y. Zhang, and Q. Xiang, “Geographically weighted random forests for macro-level crash frequency prediction,” Accid. Anal. Prev., vol. 194, p. 107370, Jan. 2024, doi: 10.1016/j.aap.2023.107370.

K. Kopczewska, “Spatial machine learning: new opportunities for regional science,” Ann. Reg. Sci., vol. 68, no. 3, pp. 713–755, Jun. 2022, doi: 10.1007/s00168-021-01101-x.

Y. Zhou et al., “Estimating Regional Forest Carbon Density Using Remote Sensing and Geographically Weighted Random Forest Models: A Case Study of Mid- to High-Latitude Forests in China,” Forests, vol. 16, no. 1, p. 96, Jan. 2025, doi: 10.3390/f16010096.

S. Bhattacharya et al., “Correlation between visuo-cognitive tests and simulator performance of commercial drivers in the United States,” Accid. Anal. Prev., vol. 184, p. 106994, May 2023, doi: 10.1016/j.aap.2023.106994.

Z. Li, “GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models,” Ann. Am. Assoc. Geogr., vol. 114, no. 7, pp. 1365–1385, Aug. 2024, doi: 10.1080/24694452.2024.2350982.

Y. Li et al., “STAR: A First-Ever Dataset and a Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 3, pp. 1832–1849, Mar. 2025, doi: 10.1109/TPAMI.2024.3508072.

S. Raza, M. Garg, D. J. Reji, S. R. Bashir, and C. Ding, “Nbias: A natural language processing framework for BIAS identification in text,” Expert Syst. Appl., vol. 237, p. 121542, Mar. 2024, doi: 10.1016/j.eswa.2023.121542.

D. Koldasbayeva, P. Tregubova, M. Gasanov, A. Zaytsev, A. Petrovskaia, and E. Burnaev, “Challenges in data-driven geospatial modeling for environmental research and practice,” Nat. Commun., vol. 15, no. 1, p. 10700, Dec. 2024, doi: 10.1038/s41467-024-55240-8.

L. Sherman, J. Proctor, H. Druckenmiller, H. Tapia, and S. Hsiang, “Global High-Resolution Estimates of the United Nations Human Development Index Using Satellite Imagery and Machine-learning,” Cambridge, MA, Mar. 2023. doi: 10.3386/w31044.

M. D. Ogah, J. Essien, M. Ogharandukun, and M. Abdullahi, “Machine Learning Models for Heterogenous Network Security Anomaly Detection,” J. Comput. Commun., vol. 12, no. 06, pp. 38–58, 2024, doi: 10.4236/jcc.2024.126004.

B. Nikparvar and J.-C. Thill, “Machine Learning of Spatial Data,” ISPRS Int. J. Geo-Information, vol. 10, no. 9, p. 600, Sep. 2021, doi: 10.3390/ijgi10090600.

M. Geerts, S. vanden Broucke, and J. De Weerdt, “GeoRF: a geospatial random forest,” Data Min. Knowl. Discov., vol. 38, no. 6, pp. 3414–3448, Nov. 2024, doi: 10.1007/s10618-024-01046-7.

F. Lu, G. Zhang, T. Wang, Y. Ye, and Q. Zhao, “Geographically Weighted Random Forest Based on Spatial Factor Optimization for the Assessment of Landslide Susceptibility,” Remote Sens., vol. 17, no. 9, p. 1608, May 2025, doi: 10.3390/rs17091608.

S. Yulianti, Y. Widyanigsih, and S. Nurrohmah, “Spatial panel data model on human development index at Central Java,” J. Phys. Conf. Ser., vol. 1722, no. 1, p. 012090, Jan. 2021, doi: 10.1088/1742-6596/1722/1/012090.

C. B. of Statistics, “IPM menurut Kabupaten Kota di Jawa Tengah,” Central Bureau of Statistics, 2024. jateng.bps.go.id (accessed Jul. 21, 2025).

N. Shrestha, “Detecting Multicollinearity in Regression Analysis,” Am. J. Appl. Math. Stat., vol. 8, no. 2, pp. 39–42, Jun. 2020, doi: 10.12691/ajams-8-2-1.

D. A. Belsley, Conditioning Diagnostics: Collinearity and Weak Data in Regression. New York: John Wiley & Sons, 1991.

C. Brunsdon, A. S. Fotheringham, and M. E. Charlton, “Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity,” Geogr. Anal., vol. 28, no. 4, pp. 281–298, Oct. 1996, doi: 10.1111/j.1538-4632.1996.tb00936.x.

I. M. Putra, I. Tahyudin, H. A. A. Rozaq, A. Y. Syafa’At, R. Wahyudi, and E. Winarto, “Classification analysis of COVID19 patient data at government hospital of banyumas using machine learning,” in 2021 2nd International Conference on Smart Computing and Electronic Enterprise: Ubiquitous, Adaptive, and Sustainable Computing Solutions for New Normal, ICSCEE 2021, Jun. 2021, pp. 271–274, doi: 10.1109/ICSCEE50312.2021.9498020.

B. P. Statistik, “Indeks Pembangunan Manusia,” in Badan Pusat Statistik, Jakarta, 2011.

N. H. Kim, S. G. Yu, S. E. Kim, and E. C. Lee, “Non-contact oxygen saturation measurement using ycgcr color space with an rgb camera,” Sensors, vol. 21, no. 18, 2021, doi: 10.3390/s21186120.

A. S. Fotheringham and T. M. Oshan, “Geographically weighted regression and multicollinearity: dispelling the myth,” J. Geogr. Syst., vol. 18, no. 4, pp. 303–329, Oct. 2016, doi: 10.1007/s10109-016-0239-5.

S. Quiñones, A. Goyal, and Z. U. Ahmed, “Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA,” Sci. Rep., vol. 11, no. 1, p. 6955, Mar. 2021, doi: 10.1038/s41598-021-85381-5.

S. S. Gokhale, V. Lebakula, and A. Peluso, “Explaining Health Risk Behaviors in the U.S. with Social Deprivation at Local and Regional Levels,” in 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Jul. 2024, pp. 1856–1864, doi: 10.1109/COMPSAC61105.2024.00294.

B. Kurniati, “Perbandingan Metode Geographically Weighted Regression dan Geographically Weighted Random Forest pada Kasus Kriminalitas di Jawa Timur,” Jember University, 2022.

Y. S. Dewi, S. Hastuti, and M. Fatekurohman, “Analysis of stunting in East Java, Indonesia using random forest and geographically weighted random forest regression,” Brazilian J. Biometrics, vol. 42, no. 3, pp. 213–224, Aug. 2024, doi: 10.28951/bjb.v42i3.679.

Additional Files

Published

2025-09-02

How to Cite

[1]
S. . Zuhdi, I. N. . Fatatik, I. N. F. H. . Prihasno, and H. A. A. . Rozaq, “Geographically Weighted Random Forests for Human Development Index of Central Java Prediction”, J. Tek. Inform. (JUTIF), vol. 6, no. 4, pp. 2756–2768, Sep. 2025.

Most read articles by the same author(s)