COMPARISON OF MACHINE LEARNING METHODS IN CLASSIFYING POVERTY IN INDONESIA IN 2018
Abstract
Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.
Downloads
References
E. Fissuh and M. Harris, Modeling Determinants of Poverty in Eritrea: A New Approach, pp. 1-35, 2005.
J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techhiques Third Edition, Waltham: Elsevier Inc, 2012.
F.J. Kaunang, “Penerapan Algoritma J48 Decision Tree Untuk Analisis Tingkat Kemiskinan di Indonesia”, Cogito Smart Journal, vol 4, no 2,pp 348-357, 2018
B. Sartono, "Tinjauan Terhadap Keunggulan Pohon Klasifikasi Ensemble Untuk Memperbaiki Kemampuan Prediksi Pohon Klasifikasi Tunggal," BIAStatistics, vol. 9, no. 2, pp. 33-38, 2015.
C. P. P. Supriyanto, "Deteksi Penyakit Diabetes Type II dengan Naive Bayes Berbasis Particle Swarm Optimization," Jurnal Teknologi Informasi, vol. 9, no. 2, 2013.
H.Annur,”Klasifikasi Masyarakat Miskin Menggunakan Metode Naive Bayes”, ILKOM Jurnal Ilmiah, vol. 10, no.2, pp 160-165, 2018
W. Yustanti, "Algoritma K-Nearest Neighbour untuk Memprediksi Harga Jual Tanah," Jurnal Matematika, Statistika, & Komputasi, vol. 9, no. 1, pp. 57-68, 2012.
F. Kurnia, “Klasifikasi Keluarga Miskin Menggunakan Metode K-Nearest Neighbor Berbasis Euclidean Distance”, Seminar Nasional Teknologi Informasi, Komunikasi dan Industri (SNTIKI) 11, Fakultas Sains dan Teknologi, UIN Sultan Syarif Kasim Riau, pp 230-239, 2019
M. Maalouf and T. Trafalis, "Rare Events and Imbalanced Datasets: An Overview," Int. Journal Data Mining, Modelling and Management, vol. 3, no. 4, pp. 375-385, 2011.
C. Czado and T. Santner, "The effect of link misspecification on binary regression inference," J. Statist. Plann. Inference 33, p. 213–231. MR1190622, 1992.
G. King and L. Zeng, "Logistic Regression in Rare Events Data," Journal of Political Analysis, vol. 9, no. 2, pp. 137-163, 2001.
T.Purwa, “Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017)”, JMSK, vol. 16, no. 1, pp 58-73, 2019
J. Han and M. Kamber, “Data Mining Concept and Technique, Morgan: Kaufmann, 2011.
J. Rodriguez, L. Kuncheva and C. Alonso, "RotationForest: A New Classifier Ensemble Method," IEEE Transactions on Pattern Analysis and Machine, vol. 28, no. 10, p. 1619–1630, 2006.
Han, Jiawei, M. Kamber and J. Pei, Data Mining: Concepts and Techniques 3rd Edition, Massachusetts: Elsevier Inc, 2012.
T. Fawcett, "An Introduction to ROC Analysis. Journal of Pattern Recognition Letters," An Introduction to ROC Analysis. Journal of Pattern Recognition Letters, vol. 27, pp. 861-874, 2006.
F. Gorunescu, Data Mining Concept, Models and Techniques., Verlag Berlin Heidelberg: Springer, 2011.
J. Landis and G. Koch, "The Measurment of Observer Agreement for Categorical Data," 2013. [Online]. Available: www.ncbi.nlm.nih.gov/pubmed/843571.
Copyright (c) 2021 Pardomuan Robinson Sihombing, Ade Marsinta Arsani
This work is licensed under a Creative Commons Attribution 4.0 International License.