Bank Customer Churn Prediction Using CTGAN-Augmented Data and Boosting-Based Ensemble Learning with SHAP Explainable AI

Authors

  • Mohamad Syazimmi Hersyaputra Department of Informatics, Institut Teknologi Sepuluh Nopember, Indonesia
  • Shintami Chusnul Hidayati Department of Informatics, Institut Teknologi Sepuluh Nopember, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.3.5578

Keywords:

Bank Customer Churn, Boosting-Based Ensemble Learning, CTGAN, Explainable AI, SHAP

Abstract

Customer churn prediction remains a fundamental concern in the banking domain due to its direct impact on revenue stability and long-term customer value. A key challenge in churn modeling lies in severe class imbalance, which often limits model sensitivity toward minority churn cases. This study aims to develop an integrated and explainable churn prediction framework that effectively addresses class imbalance while maintaining robust predictive performance and interpretability. The proposed approach employs Conditional Tabular Generative Adversarial Networks (CTGAN), comparison of five boosting-based ensemble learning, and SHapley Additive exPlanations (SHAP) to preserve model interpretability. CTGAN is leveraged to synthesize high-fidelity instances for the churn class, yielding a class-balanced dataset that retains intricate tabular feature distributions. Five boosting-based ensemble models, XGBoost, CatBoost, Gradient Boosting Machine (GBM), Stochastic Gradient Boosting (SGB), and LightGBM, are systematically tuned using randomized hyperparameter optimization and evaluated under consistent experimental settings. Model performance is assessed using accuracy, precision, recall, and F1-score to capture classification performance under class imbalance. To ensure transparency, SHAP is applied to analyze global feature importance influencing churn predictions. Experimental results indicate CTGAN enhances model learning stability and detection capability. Among the evaluated models, CatBoost achieves the best results, with an accuracy of 0.9748 and an F1-score of 0.9178. The explainability analysis reveals that transactional features play a dominant role in churn. The novelty of this study lies in a unified and explainable churn prediction framework that integrates CTGAN-data augmentation, boosting ensembles, and interpretability for robust decision support in banking analytics.

Downloads

Download data is not yet available.

References

A. G. Văduva, S. V. Oprea, A. M. Niculae, A. Bâra, and A. I. Andreescu, “Improving Churn Detection in the Banking Sector: A Machine Learning Approach with Probability Calibration Techniques,” Electronics (Switzerland), vol. 13, no. 22, Nov. 2024, doi: 10.3390/electronics13224527.

A. Agnihotri and R. Saravanakumar, “Customer Retention in Banking: Utilizing AI and Machine Learning for Predictive Churn Analysis,” in Proceedings of 2025 3rd International Conference on Intelligent Systems, Advanced Computing, and Communication, ISACC 2025, Institute of Electrical and Electronics Engineers Inc., 2025, pp. 140–144. doi: 10.1109/ISACC65211.2025.10969188.

D. O. U. Orina, R. Rimiru, and W. Mwangi, “A Comparative Study of Predictive Data Mining Techniques for Customer Churn in the Banking Industry,” in 1st International Conference of Intelligent Methods, Systems and Applications, IMSA 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 222–227. doi: 10.1109/IMSA58542.2023.10217514.

A. Manzoor, M. Atif Qureshi, E. Kidney, and L. Longo, “A Review on Machine Learning Methods for Customer Churn Prediction and Recommendations for Business Practitioners,” IEEE Access, vol. 12, pp. 70434–70463, 2024, doi: 10.1109/ACCESS.2024.3402092.

U. Gani Joy, K. E. Hoque, M. Nazim Uddin, L. Chowdhury, and S. B. Park, “A Big Data-Driven Hybrid Model for Enhancing Streaming Service Customer Retention Through Churn Prediction Integrated With Explainable AI,” IEEE Access, vol. 12, pp. 69130–69150, 2024, doi: 10.1109/ACCESS.2024.3401247.

D. D. Ninditha Silalahi, Marsella, A. A. Valentino, I. S. Edbert, and D. Suhartono, “Bagging and Boosting for Predicting Bank Customer Churn,” in 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation, ICAMIMIA 2023 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICAMIMIA60881.2023.10427686.

S. C. K. Tékouabou, Ștefan C. Gherghina, H. Toulni, P. N. Mata, and J. M. Martins, “Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods,” Mathematics, vol. 10, no. 14, Jul. 2022, doi: 10.3390/math10142379.

Y. Deng, D. Li, L. Yang, J. Tang, and J. Zhao, “Analysis and prediction of bank user churn based on ensemble learning algorithm,” in Proceedings of 2021 IEEE International Conference on Power Electronics, Computer Applications, ICPECA 2021, Institute of Electrical and Electronics Engineers Inc., Jan. 2021, pp. 288–291. doi: 10.1109/ICPECA51329.2021.9362520.

U. Mansoor, V. Sivakumar, and M. Jayabalan, “Customer Churn Prediction in The Banking Sector on Imbalance Dataset,” in International Conference on Integrated Intelligence and Communication Systems, ICIICS 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICIICS59993.2023.10421738.

J. Li, X. Bai, Q. Xu, and D. Yang, “Identification of Customer Churn Considering Difficult Case Mining,” Systems, vol. 11, no. 7, Jul. 2023, doi: 10.3390/systems11070325.

A. Soni, J. Mishra, and M. Dixit, “Comparative Study of Bank Customers Churn Prediction using AI/ML,” in IEEE International Conference on Communication Systems and Network Technologies, Jabalpur: IEEE, 2024, pp. 1359–1365. doi: 10.1109/CSNT.2024.224.

D. Hason Rudd, H. Huo, and G. Xu, “Improved Churn Causal Analysis Through Restrained High-Dimensional Feature Space Effects in Financial Institutions,” Human-Centric Intelligent Systems, vol. 2, no. 3, pp. 70–80, Dec. 2022, doi: 10.1007/s44230-022-00006-y.

A. S. Nair, A. Krishna, S. T. Gupta, and S. Susan, “Credit Card Fraud Detection using Soft Voting Ensemble with Imbalance Treatment,” in 2025 5th International Conference on Intelligent Technologies, CONIT 2025, Institute of Electrical and Electronics Engineers Inc., 2025. doi: 10.1109/CONIT65521.2025.11167470.

I. N. M. Adiputra and P. Wanchai, “CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction,” J Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00982-x.

A. Muneer, R. F. Ali, A. Alghamdi, S. M. Taib, A. Almaghthawi, and E. A. Abdullah Ghaleb, “Predicting customers churning in banking industry: A machine learning approach,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 1, pp. 539–549, Apr. 2022, doi: 10.11591/ijeecs.v26.i1.pp539-549.

B. A. Maulana and N. Hidayati, “Churn Prediction in Credit Customers Using Random Forest and XGBoost Methods,” Indonesian Journal of Data and Science, vol. 6, no. 1, pp. 82–90, Mar. 2025, doi: 10.56705/ijodas.v6i1.215.

A. Singh, R. Vashisth, N. Sindhwani, and G. Arora, “Credit Card Users Churn Prediction Using Ensemble Techniques,” in Proceedings - International Conference on Technological Advancements in Computational Sciences, ICTACS 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 1283–1290. doi: 10.1109/ICTACS59847.2023.10390508.

R. Bhuria et al., “Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data,” Discover Sustainability, vol. 6, no. 1, Dec. 2025, doi: 10.1007/s43621-025-00807-8.

A. O. Babatunde, S. A. Yinusa, I. D. Oladipo, and A. W. Asaju-Gbolagade, “View of Customer Churn Prediction in Neobanking System Using Predictive Analytics and Feature Selection,” Systems and Computing, no. 1, pp. 27–43, 2025, doi: https://doi.org/10.64409/sycom.v1.i1.14.

F. Sağlam and M. A. Cengiz, “A novel SMOTE-based resampling technique trough noise detection and the boosting procedure,” Expert Syst Appl, vol. 200, Aug. 2022, doi: 10.1016/j.eswa.2022.117023.

J. Černevičienė and A. Kabašinskas, “Explainable artificial intelligence (XAI) in finance: a systematic literature review,” Artif Intell Rev, vol. 57, no. 8, Aug. 2024, doi: 10.1007/s10462-024-10854-8.

O. Parise, R. Kronenberger, G. Parise, C. de Asmundis, S. Gelsomino, and M. La Meir, “CTGAN-driven synthetic data generation: A multidisciplinary, expert-guided approach (TIMA),” Comput Methods Programs Biomed, vol. 259, Feb. 2025, doi: 10.1016/j.cmpb.2024.108523.

A. M. Salih et al., “A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME,” Advanced Intelligent Systems, vol. 7, no. 1, Jan. 2025, doi: 10.1002/aisy.202400304.

H. A. Raouf, M. M. Fouda, and M. I. Ibrahem, “Revolutionizing User Authentication Exploiting Explainable AI and CTGAN-Based Keystroke Dynamics,” IEEE Open Journal of the Computer Society, vol. 6, pp. 97–108, 2025, doi: 10.1109/OJCS.2024.3513895.

A. Alzahrani, “Early Detection of Lung Cancer Using Predictive Modeling Incorporating CTGAN Features and Tree-Based Learning,” IEEE Access, vol. 13, pp. 34321–34333, 2025, doi: 10.1109/ACCESS.2025.3543215.

O. Habibi, M. Chemmakha, and M. Lazaar, “Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection,” Eng Appl Artif Intell, vol. 118, Feb. 2023, doi: 10.1016/j.engappai.2022.105669.

C. G. L. Pringandana and K. Kusnawi, “A Comparative Analysis of Hyperparameter-Tuned XGBoost and LightGBM for Multiclass Rainfall Classification in Jakarta,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 4, pp. 2467–2483, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.4965.

I. Maulana, A. M. Siregar, S. A. P. Lestari, and S. Faisal, “OPTIMAL STUDY OF REAL-ESTATE PRICE PREDICTION MODELS USING MACHINE LEARNING,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 1149–1164, Aug. 2024, doi: 10.52436/1.jutif.2024.5.4.2565.

E. E. Başakın, Ö. Ekmekcioğlu, P. C. Stoy, and M. Özger, “Estimation of daily reference evapotranspiration by hybrid singular spectrum analysis-based stochastic gradient boosting,” MethodsX, vol. 10, 2023, doi: 10.1016/j.mex.2023.102163.

C. Yu, Y. Jin, Q. Xing, Y. Zhang, S. Guo, and S. Meng, “Advanced User Credit Risk Prediction Model Using LightGBM, XGBoost and Tabnet with SMOTEENN,” in 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems, ICPICS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 876–883. doi: 10.1109/ICPICS62053.2024.10796247.

R. K. Makumbura et al., “Advancing water quality assessment and prediction using machine learning models, coupled with explainable artificial intelligence (XAI) techniques like shapley additive explanations (SHAP) for interpreting the black-box nature,” Results in Engineering, vol. 23, Sep. 2024, doi: 10.1016/j.rineng.2024.102831.

Additional Files

Published

2026-06-15

How to Cite

[1]
M. S. Hersyaputra and S. C. Hidayati, “Bank Customer Churn Prediction Using CTGAN-Augmented Data and Boosting-Based Ensemble Learning with SHAP Explainable AI”, J. Tek. Inform. (JUTIF), vol. 7, no. 3, pp. 2376–2394, Jun. 2026.