IDENTIFYING POTENTIAL CREDIT CARD PAYMENT DEFAULTS USING GMDKNN WITH LOF AS OUTLIER HANDLING
Abstract
In classifying data, accuracy results are greatly influenced by outliers. The presence of outliers can cause a low level of accuracy in the classification process. The Generalised Mean Distance K-Nearest Neighbor (GMD-KNN) algorithm is a classification technique that shows advantages in terms of flexibility and responsiveness to attribute variations. This research aims to classify credit card data between current and bad payments by handling outliers using the Local Outlier Factor (LOF). The data used is 30,000 credit card transaction data taken from the UCI Machine Learning Repository. This research method uses several stages, namely data collection, data pre-processing carried out to detect and clean outliers with LOF, classification process with GMD-KNN, and evaluation to calculate the accuracy of classification results. As a result, the model shows the best performance at 80%:20% data sharing ratio with k=5 value, achieving 77.60% accuracy, 74.97% precision, 82.57% recall, 78.58% F1-Score, and 77.48% G-Mean.
Downloads
References
P. A. Ariawan, “Optimasi Pengelompokan Data Pada Metode K-means dengan Analisis Outlier,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 5, no. 2, pp. 88–95, Sep. 2019, doi: 10.25077/teknosi.v5i2.2019.88-95.
Juozas Auskalnis, Nerijus Paulauskas, and Algirdas Baskys, “Application of Local Outlier Factor Algorithm to Detect Anomalies in Computer Network,” Elektronika IR Elektrotechnika, ISSN 1392-1215, vol. 24, Dec. 2018, doi: 10.1109/CISDA.2009.5356528.
A. Smiti, “A critical overview of outlier detection methods,” Computer Science Review, vol. 38. Elsevier Ireland Ltd, Nov. 01, 2020. doi: 10.1016/j.cosrev.2020.100306.
D. Zou et al., “Outlier detection and data filling based on KNN and LOF for power transformer operation data classification,” Energy Reports, vol. 9, pp. 698–711, Sep. 2023, doi: 10.1016/j.egyr.2023.04.094.
S. P. Maniraj, A. Saini, S. Deep Sarkar, and S. Ahmed, “Credit Card Fraud Detection using Machine Learning and Data Science,” International Journal of Engineering Research & Technology (IJERT), 2019.
E. Strelcenia and S. Prakoonwit, “Improving Classification Performance in Credit Card Fraud Detection by Using New Data Augmentation,” AI (Switzerland), vol. 4, no. 1, pp. 172–198, Mar. 2023, doi: 10.3390/ai4010008.
S. Sugidamayatno and D. Lelono, “Outlier Detection Credit Card Transactions Using Local Outlier Factor Algorithm (LOF),” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 4, p. 409, Oct. 2019, doi: 10.22146/ijccs.46561.
E. H. Yulianti, O. Soesanto, and Y. Sukmawaty, “Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit,” JOMTA Journal of Mathematics: Theory and Applications, vol. 4, no. 1, 2022.
Rishikeshan, S. K. Sakala, S. Prasath, and M. Anitha, “Credit Card Fraud Detection Using Isolation Forest and Local Outlier Factor,” International Journal of Scientific Research in Engineering and Management (IJSREM) , vol. 06, no. 06, Jun. 2022, doi: 10.55041/ijsrem14371.
H. Xu, L. Zhang, P. Li, and F. Zhu, “Outlier detection algorithm based on k-nearest neighbors-local outlier factor,” J Algorithm Comput Technol, vol. 16, Mar. 2022, doi: 10.1177/17483026221078111.
J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, and H. Yang, “A generalized mean distance-based k-nearest neighbor classifier,” Expert Syst Appl, vol. 115, pp. 356–372, Jan. 2019, doi: 10.1016/j.eswa.2018.08.021.
S. Sugriyono and M. U. Siregar, “Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 4, Oct. 2020, doi: 10.14710/jtsiskom.2020.13874.
D. Rosita, dan Syamsuddin Mallala, S. Informasi, S. Widya Cipta Dharma, T. Informatika, and P. Korespondensi, “Komparasi Data Mining Naive Bayes dan Neural Network Memprediksi Masa Studi Mahasiswa S1,” vol. 7, no. 3, pp. 443–452, 2020, doi: 10.25126/jtiik.202072093.
D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” Jurnal Media Informatika Budidarma, vol. 4, no. 2, p. 437, Apr. 2020, doi: 10.30865/mib.v4i2.2080.
F. Wafiyah, N. Hidayat, and R. S. Perdana, “Implementasi Algoritma Modified K-Nearest Neighbor (MKNN) untuk Klasifikasi Penyakit Demam,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 1, no. 10, 2017.
S. Sumayah, F. Sembiring, and W. Jatmiko, “Analysis Of Sentiment Of Indonesian Community On Metaverse Using Support Vector Machine Algorithm,” Jurnal Teknik Informatika (JUTIF), vol. 4, no. 1, 2023, doi: 10.20884/1.jutif.2023.4.1.417.
S. Chawla and A. Gionis, “k-means-: A unified approach to clustering and outlier detection”.
S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-10358-x.
N. Triadha Pitaloka, “PCOS Disease Classification Using Feature Selection Rfecv And Eda With Knn Algorithm Method,” Jurnal Teknik Informatika (JUTIF), vol. 4, no. 4, pp. 693–701, 2023, doi: 10.52436/1.jutif.2023.4.4.693.
F. Rahmad, Y. Suryanto, and K. Ramli, “Performance Comparison of Anti-Spam Technology Using Confusion Matrix Classification,” in IOP Conference Series: Materials Science and Engineering, IOP Publishing Ltd, Aug. 2020.
S. Tiwari, V. Sapra, and A. Jain, “Heartbeat sound classification using Mel-frequency cepstral coefficients and deep convolutional neural network,” in Advances in Computational Techniques for Biomedical Image Analysis: Methods and Applications, Elsevier, 2020, pp. 115–131.
Copyright (c) 2024 Liony Puspita Dewi, Yulison Herry Chrisnanto, Rezki Yuniarti
This work is licensed under a Creative Commons Attribution 4.0 International License.