ANALYSIS OF THE EFFECTIVENESS OF POLYNOMIAL FIT SMOTE MESH ON IMBALANCE DATASET FOR BANK CUSTOMER CHURN PREDICTION WITH XGBOOST AND BAYESIAN OPTIMIZATION

  • Jhiro Faran Post Graduate Information Technology Department, Universitas Nasional, Indonesia
  • Agung Triayudi Post Graduate Information Technology Department, Universitas Nasional, Indonesia
Keywords: Bayesian Optimization, Churn, Imbalance data, SMOTE, XGBoost

Abstract

The case of churn in the banking industry, namely customers who leave or no longer use bank services, is a serious problem that requires an appropriate solution. The aim of this research is to predict churn and take appropriate preventive actions using machine learning. The dataset contains 10,000 bank customer data with 14 relevant features. Only about 20% of customers experience churn, creating a data imbalance problem in classification. To overcome data imbalances, the SMOTE oversampling technique was applied. Also introduced was the development of the SMOTE technique, namely, Polynomial Fit SMOTE Mesh (PFSM). PFSM works by combining each point in the data with a linear function and producing synthetic data at each connected distance. Experimental results show that the model developed using PFSM and optimized with Bayesian Optimization for the XGBoost algorithm achieved 86.1% accuracy, 70.87% precision, 53.81% recall, and 61.17% F-score. This indicates that the approach is successful in improving predictive capabilities and identifying potential customers for churn earlier. This research has significant relevance in the banking industry, helping banks to safeguard their customers and improve banking business performance..

Downloads

Download data is not yet available.

References

I. Kaur and J. Kaur, “Customer Churn Analysis and Prediction in Banking Industry using Machine Learning,” in 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), IEEE, Nov. 2020, pp. 434–437. doi: 10.1109/PDGC50313.2020.9315761.

P. Chen, N. Liu, and B. Wang, “Evaluation of Customer Behaviour with Machine Learning for Churn Prediction: The Case of Bank Customer Churn in Europe,” in Proceedings of the International Conference on Financial Innovation, FinTech and Information Technology, FFIT 2022, October 28-30, 2022, Shenzhen, China, EAI, 2023. doi: 10.4108/eai.28-10-2022.2328450.

F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf Sci (N Y), vol. 513, pp. 429–441, Mar. 2020, doi: 10.1016/j.ins.2019.11.004.

G. Kovács, “An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,” Appl Soft Comput, vol. 83, p. 105662, Oct. 2019, doi: 10.1016/j.asoc.2019.105662.

M. Maw, S.-C. Haw, and C.-K. Ho, “Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems,” F1000Res, vol. 10, p. 988, Jun. 2022, doi: 10.12688/f1000research.72929.2.

M. Rahman and V. Kumar, “Machine Learning Based Customer Churn Prediction In Banking,” in 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, Nov. 2020, pp. 1196–1201. doi: 10.1109/ICECA49313.2020.9297529.

A. Muneer, R. Faizan Ali, A. Alghamdi, S. Mohd Taib, A. Almaghthawi, and E. A. A. Ghaleb, “Predicting customers churning in banking industry: A machine learning approach,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 1, p. 539, Apr. 2022, doi: 10.11591/ijeecs.v26.i1.pp539-549.

H. Al Majzoub, I. Elgedawy, Ö. Akaydın, and M. Köse Ulukök, “HCAB-SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification,” Arab J Sci Eng, vol. 45, no. 4, pp. 3205–3222, Apr. 2020, doi: 10.1007/s13369-019-04336-1.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information, vol. 14, no. 1, p. 54, Jan. 2023, doi: 10.3390/info14010054.

P. Zhang, Y. Jia, and Y. Shang, “Research and application of XGBoost in imbalanced data,” Int J Distrib Sens Netw, vol. 18, no. 6, p. 155013292211069, Jun. 2022, doi: 10.1177/15501329221106935.

B. Valarmathi, T. Chellatamilan, H. Mittal, J. Jagrit, and S. Shubham, “Classification of Imbalanced Banking Dataset using Dimensionality Reduction,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), IEEE, May 2019, pp. 1353–1357. doi: 10.1109/ICCS45141.2019.9065648.

V. Perrone, M. Donini, M. B. Zafar, R. Schmucker, K. Kenthapadi, and C. Archambeau, “Fair Bayesian Optimization,” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA: ACM, Jul. 2021, pp. 854–863. doi: 10.1145/3461702.3462629.

S. M. Sina Mirabdolbaghi and B. Amiri, “Model Optimization Analysis of Customer Churn Prediction Using Machine Learning Algorithms with Focus on Feature Reductions,” Discrete Dyn Nat Soc, vol. 2022, pp. 1–20, Jun. 2022, doi: 10.1155/2022/5134356.

S. C. K. Tékouabou, Ștefan C. Gherghina, H. Toulni, P. N. Mata, and J. M. Martins, “Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods,” Mathematics, vol. 10, no. 14, p. 2379, Jul. 2022, doi: 10.3390/math10142379.

P. Lalwani, M. K. Mishra, J. S. Chadha, and P. Sethi, “Customer churn prediction system: a machine learning approach,” Computing, vol. 104, no. 2, pp. 271–294, Feb. 2022, doi: 10.1007/s00607-021-00908-y.

A. Prabha, J. Yadav, A. Rani, and V. Singh, “Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier,” Comput Biol Med, vol. 136, p. 104664, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104664.

Published
2024-05-18
How to Cite
[1]
J. Faran and A. Triayudi, “ANALYSIS OF THE EFFECTIVENESS OF POLYNOMIAL FIT SMOTE MESH ON IMBALANCE DATASET FOR BANK CUSTOMER CHURN PREDICTION WITH XGBOOST AND BAYESIAN OPTIMIZATION”, J. Tek. Inform. (JUTIF), vol. 5, no. 3, pp. 661-667, May 2024.