Performance Evaluation of Gradient Boosting Techniques for Predicting Customer Purchase Decisions

Florentina Yuni Arini; Lyon Ambrosio Djuanda; Ananda Hisma Putra Kristianto; Muthia Nis Tiadah; Aufa Putra Wicaksono; Fatih Akbar Alim Putra

doi:10.52436/1.jutif.2026.7.2.5461

Authors

Florentina Yuni Arini Informatics Engineering, Universitas Negeri Semarang, Indonesia
Lyon Ambrosio Djuanda Universitas Negeri Semarang
Ananda Hisma Putra Kristianto Informatics Engineering, Universitas Negeri Semarang, Indonesia
Muthia Nis Tiadah Informatics Engineering, Universitas Negeri Semarang, Indonesia
Aufa Putra Wicaksono Informatics Engineering, Universitas Negeri Semarang, Indonesia
Fatih Akbar Alim Putra Informatics Engineering, Universitas Negeri Semarang, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5461

Keywords:

CatBoost, Customer Purchase Prediction, Gradient Boosting, Machine Learning, SMOTE

Abstract

Customer purchase prediction remains a critical challenge in e-commerce and retail analytics, with significant implications for marketing strategies and business revenue. This research provides a detailed comparative evaluation of advanced gradient boosting techniques XGBoost, LightGBM, and CatBoost to predict customer purchasing behavior using review trends and demographic factors. The study employed a dataset of 100 customer records with attributes such as age, gender, review quality, and education level. Through systematic feature engineering, including age group categorization and categorical feature combinations, as well as addressing class imbalance using the Synthetic Minority Oversampling Technique (SMOTE), all three models were trained and evaluated using default hyperparameters with optimal settings. The experimental results show that CatBoost achieved the best performance, with 78.26% accuracy, 0.8011 precision, 0.7826 recall, and a 0.7775 F1-score, outperforming LightGBM (73.91% accuracy) and XGBoost (60.87% accuracy). The evaluation includes confusion matrix analysis, precision–recall metrics, and visual comparisons across all performance dimensions. These findings provide valuable insights for practitioners selecting appropriate machine learning algorithms for customer purchase prediction tasks, particularly in scenarios involving limited datasets and categorical features. This research contributes to the growing body of literature on the use of gradient boosting techniques for predicting consumer behavior and offers important practical implications for e-commerce applications. These findings offer important contributions to machine learning applications in customer behavior prediction.

Downloads

Download data is not yet available.

References

Y. Wang and C. Zhang, “Research on Customer Purchase Intention Prediction Methods for E-commerce Platforms Based on User Behavior Data,” J. Adv. Comput. Syst. Content Available SciPublication, vol. 3, no. 10, pp. 23–38, 2023.

S. Li, “Machine Learning-based Prediction Mechanism of Repeated Purchase Behavior of E-commerce Customers,” 2024 IEEE 4th Int. Conf. Electron. Commun. Internet Things Big Data, ICEIB 2024, pp. 522–526, 2024, doi: 10.1109/ICEIB61477.2024.10602662.

Z. Li, “Application and Optimization of Various Machine Learning Models in Social E-Commerce Marketing Strategies,” Trans. Comput. Sci. Intell. Syst. Res., vol. 4, pp. 11–21, 2024, doi: 10.62051/bsm4y952.

Z. Duan, C. Wang, and W. Zhong, “SSGCL: Simple Social Recommendation with Graph Contrastive Learning,” Mathematics, vol. 12, no. 7, 2024, doi: 10.3390/math12071107.

F. Ehsani and M. Hosseini, “Customer churn analysis using feature optimization methods and tree-based classifiers,” J. Serv. Mark., vol. 39, no. 1, pp. 20–35, 2025, doi: 10.1108/JSM-04-2024-0156.

L. Schmid, M. Roidl, A. Kirchheim, and M. Pauly, “Comparing Statistical and Machine Learning Methods for Time Series Forecasting in Data-Driven Logistics—A Simulation Study,” Entropy, vol. 27, no. 1, 2025, doi: 10.3390/e27010025.

Y. Dou, S. Tan, and D. Xie, “Comparison of machine learning and statistical methods in the field of renewable energy power generation forecasting: a mini review,” Front. Energy Res., vol. 11, 2023, doi: 10.3389/fenrg.2023.1218603.

I. V. Pustokhina and D. A. Pustokhin, “A Comparative Analysis of Traditional Forecasting Methods and Machine Learning Techniques for Sales Prediction in E-commerce,” Am. J. Bus. Oper. Res., vol. 10, no. 2, pp. 39–51, 2023, doi: 10.54216/ajbor.100205.

J. Gami et al., “Impact of Demographics on Consumer Preferences in Online Shopping: An Analysis of Age, Gender, and Education Factors,” Greenation Int. J. Econ. Account., vol. 1, no. 4, pp. 571–584, 2024, doi: 10.38035/gijea.v1i4.303.

N. Rane, S. Choudhary, and J. Rane, “Ensemble Deep Learning and Machine Learning: Applications, Opportunities, Challenges, and Future Directions,” SSRN Electron. J., 2024, doi: 10.2139/ssrn.4849885.

A. Vijayakumar and P. P. Mathai, “Optimizing Potato Crop Water Quality: A Comparative Analysis of Machine Learning Techniques and Gradient Boosting Approaches,” Stud. Comput. Intell., vol. 1215, pp. 309–326, 2025, doi: 10.1007/978-3-031-93087-4_18.

E. Jain and A. Singh, “Optimizing Gradient Boosting Algorithms for Obesity Risk Prediction: A Comparative Analysis of XGBoost, LightGBM, and CatBoost Models,” 2024 Int. Conf. Cybernation Comput. CYBERCOM 2024, pp. 320–324, 2024, doi: 10.1109/CYBERCOM63683.2024.10803186.

K. Ileri, “Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in wireless sensor networks,” Int. J. Mach. Learn. Cybern., vol. 16, no. 9, pp. 6937–6956, 2025, doi: 10.1007/s13042-025-02654-5.

S. Alsulamy, “Predicting construction delay risks in Saudi Arabian projects: A comparative analysis of CatBoost, XGBoost, and LGBM,” Expert Syst. Appl., vol. 268, 2025, doi: 10.1016/j.eswa.2024.126268.

A. Asselman, M. Khaldi, and S. Aammou, “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm,” Interact. Learn. Environ., vol. 31, no. 6, pp. 3360–3379, 2023, doi: 10.1080/10494820.2021.1928235.

A. Van Wyk, “Machine learning with LightGBM and Python : a practitioner’s guide to developing production-ready machine learning systems,” 2023.

C. S. Kulkarni, “Advancing Gradient Boosting: A Comprehensive Evaluation of the CatBoost Algorithm for Predictive Modeling,” J. Artif. Intell. Mach. Learn. Data Sci., vol. 1, no. 5, pp. 54–57, 2022, doi: 10.51219/jaimld/chinmay-shripad-kulkarni/29.

M. Zhang, J. Lu, N. Ma, T. C. E. Cheng, and G. Hua, “A Feature Engineering and Ensemble Learning Based Approach for Repeated Buyers Prediction,” Int. J. Comput. Commun. Control, vol. 17, no. 6, 2022, doi: 10.15837/ijccc.2022.6.4988.

A. Aylin Tokuc and T. Dag, “Predicting User Purchases From Clickstream Data: A Comparative Analysis of Clickstream Data Representations and Machine Learning Models,” IEEE Access, vol. 13, pp. 43796–43817, 2025, doi: 10.1109/ACCESS.2025.3548267.

D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci. (Ny)., vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.

P. Kaur and A. Gosain, “Issues and challenges of class imbalance problem in classification,” Int. J. Inf. Technol., vol. 14, no. 1, pp. 539–545, 2022, doi: 10.1007/s41870-018-0251-8.

Abdullah-All-Tanvir, I. Ali Khandokar, A. K. M. Muzahidul Islam, S. Islam, and S. Shatabda, “A gradient boosting classifier for purchase intention prediction of online shoppers,” Heliyon, vol. 9, no. 4, 2023, doi: 10.1016/j.heliyon.2023.e15163.

M. Z. Alam and T. Roy, “Predicting Online Repeat Purchases: A Comparative Analysis of Machine Learning Algorithms,” 2025 Int. Conf. Electr. Comput. Commun. Eng. ECCE 2025, 2025, doi: 10.1109/ECCE64574.2025.11013423.

S. xia Chen, X. kang Wang, H. yu Zhang, and J. qiang Wang, “Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine,” Expert Syst. Appl., vol. 173, 2021, doi: 10.1016/j.eswa.2021.114756.

Z. Somogyi, “Performance Evaluation of Machine Learning Models,” Appl. Artif. Intell., pp. 87–112, 2021, doi: 10.1007/978-3-030-60032-7_3.

G. Varoquaux and O. Colliot, “Evaluating Machine Learning Models and Their Diagnostic Value,” Neuromethods, vol. 197, pp. 601–630, 2023, doi: 10.1007/978-1-0716-3195-9_20.

A. Tschalzev, S. Marton, S. Lüdtke, C. Bartelt, and H. Stuckenschmidt, “A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data,” Adv. Neural Inf. Process. Syst., vol. 37, 2024, doi: 10.52202/079017-3039.

P. Koukaras and C. Tjortjis, “Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices,” AI, vol. 6, no. 10, 2025, doi: 10.3390/ai6100257.

G. Ravichandra, Bs & Kesavraj, “A Study on Factors Affecting Impulse Buying Behaviour of Apparel Consumer,” J. Xi’an Univ. Archit. Technol., vol. 13, no. 1, pp. 515–527, 2021.

N. Singh and A. K. Rai, “Impact Of Influencing Factors On Consumer Buying Behaviour: An Analysis,” Educ. Adm. Theory Pract., 2023, doi: 10.53555/kuey.v29i1.7103.

F. Bolikulov, R. Nasimov, A. Rashidov, F. Akhmedov, and Y. I. Cho, “Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms,” Mathematics, vol. 12, no. 16, 2024, doi: 10.3390/math12162553.

Z. Yixuan, “Utilizing machine learning algorithms for consumer behaviour analysis,” Appl. Comput. Eng., vol. 49, no. 1, pp. 213–219, 2024, doi: 10.54254/2755-2721/49/20241186.

Y. B. Wah et al., “Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction,” Comput. Mater. Contin., vol. 75, no. 3, pp. 4821–4841, 2023, doi: 10.32604/cmc.2023.034470.

I. AlShourbaji, N. Helian, Y. Sun, A. G. Hussien, L. Abualigah, and B. Elnaim, “An efficient churn prediction model using gradient boosting machine and metaheuristic optimization,” Sci. Rep., vol. 13, no. 1, 2023, doi: 10.1038/s41598-023-41093-6.

J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001, [Online]. Available: https://cursa.ihmc.us/rid=1R440PDZR-13G3T80-2W50/4. Pautas-para-evaluar-Estilos-de-Aprendizajes.pdf.

P. Florek and A. Zagdański, “Benchmarking state-of-the-art gradient boosting algorithms for classification,” 2023, [Online]. Available: http://arxiv.org/abs/2305.17094.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-August-2016, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., and Gulin A., “CatBoost: unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 6639–6649, 2019.

G. Naidu, T. Zuva, and E. M. Sibanda, “A Review of Evaluation Metrics in Machine Learning Algorithms,” Lect. Notes Networks Syst., vol. 724 LNNS, pp. 15–25, 2023, doi: 10.1007/978-3-031-35314-7_2.

A. Odeh, Q. A. Al-Haija, A. Aref, and A. A. Taleb, “Comparative Study of CatBoost, XGBoost, and LightGBM for Enhanced URL Phishing Detection: A Performance Assessment,” J. Internet Serv. Inf. Secur., vol. 13, no. 4, pp. 1–11, 2023, doi: 10.58346/JISIS.2023.I4.001.

M. R. Hossain, “Predicting Customer Churn in Telecommunications with Machine Learning Models,” Asian J. Res. Comput. Sci., vol. 18, no. 1, pp. 53–66, 2025, doi: 10.9734/ajrcos/2025/v18i1548.

V. Suryanarayana et al., “An efficient implementation of credit card fraud detection using CatBoost algorithm,” Indones. J. Electr. Eng. Comput. Sci., vol. 38, no. 3, p. 1914, 2025, doi: 10.11591/ijeecs.v38.i3.pp1914-1923.