Comparative Analysis of Classification Models for Sales Prediction in E-commerce: Decision Tree, Random Forest, SVM, Naive Bayes, and KNN

Authors

  • Eko Purwanto Information System, Universitas Duta Bangsa Surakarta, Indonesia
  • Bangun Prajadi Cipto Utomo Management, Universitas Duta Bangsa Surakarta, Indonesia
  • Hanifah Permatasari Information System, Universitas Duta Bangsa Surakarta, Indonesia
  • Farahwahida Mohd Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Malaysia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.6.5224

Keywords:

Classification models, E-commerce, Machine learning, Predictive analytics, Random Forest, Sales prediction

Abstract

The swift expansion of e-commerce has markedly heightened the necessity for precise sales forecasting, essential for efficient marketing tactics and inventory control. This research evaluates five classification models—Decision Tree, Random Forest, Support Vector Machine (SVM), Naive Bayes, and K-Nearest Neighbors (KNN)—to predict sales outcomes using e-commerce transaction data. The models were assessed utilizing criteria including accuracy, precision, recall, F1-score, AUC, and Log Loss. The findings indicate that Random Forest exceeds the performance of the other models, with an accuracy of 97.5% and an AUC of 0.991, markedly outperforming the alternatives. This study presents a unique contribution by contrasting these classification models in the realm of e-commerce in Indonesia, yielding significant insights for the advancement of more effective predictive algorithms in informatics. The results not only enhance the optimization of marketing strategies but also enrich the comprehension of machine learning applications in sales forecasting. This study underscores the necessity of choosing the appropriate model for enhanced sales forecasting, with considerable ramifications for data-driven decision-making in the e-commerce sector.

Downloads

Download data is not yet available.

References

M. Gafarov, “Predictive Analytics for Sales Forecasting and Inventory Management,” Next Gener. J. Young Res., vol. 8, no. 1, p. 109, 2024, doi: 10.62802/7t6wq430.

A. R. Jakkula, “Predictive Analytics in E-Commerce: Maximizing Business Outcomes,” J. Mark. Supply Chain Manag., vol. 2, no. 2, pp. 1–3, 2023, doi: 10.47363/jmscm/2023(2)158.

E. K. Akinyemi, A. I. Audu, O. A. O. A. P., and D. O. Ighawho, “Machine Learning Techniques in Predicting Sales a Case Study of Jumia,” Int. J. Res. Innov. Appl. Sci., vol. IX, no. XII, pp. 623–628, 2025, doi: 10.51584/ijrias.2024.912053.

S. Bhujbal, V. Rajure, S. Shinde, D. Singh, Y. Bagul, and P. Agarkar, “Predictive Analysis for Big Mart Sales Using Machine Learning Algorithms,” Int. J. Sci. Technol. Eng., vol. 12, no. 5, pp. 4313–4316, 2024, doi: 10.22214/ijraset.2024.62603.

H. Upadhyay, S. Shekhar, A. Vidyarthi, R. Prakash, and R. Gowri, “Sales Prediction in the Retail Industry Using Machine Learning: A Case Study of BigMart,” in 2023 International Conference on Electrical, Electronics, Communication and Computers (ELEXCOM), 2023, pp. 1–6. doi: 10.1109/elexcom58812.2023.10370313.

H. Chen, S. Yu, F. Huang, B. Zhu, L. Gao, and C. Qian, “Spatio-temporal Analysis of Retail Customer Behavior based on Clustering and Sequential Pattern Mining,” in 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), 2020, pp. 284–288. doi: 10.1109/ICAIBD49809.2020.9137465.

C. Xia and Y. Ma, “Sales data analysis and product layout analysis model based on association rule mining algorithm,” in Proc. SPIE 13447, International Conference on Mechatronics and Intelligent Control (ICMIC 2024), 2025, p. 133. doi: 10.1117/12.3045746.

M. H. Rifqo, G. Gunawan, D. Sunardi, and I. Nurazizah, “Implementation Of Data Mining To Find Product Sales Patterns Using The Apriori Algorithm (Case Study: Warung Dini),” J. Komputer, Inf. dan Teknol., vol. 4, no. 2, p. 11, 2024, doi: 10.53697/jkomitek.v4i2.2147.

Q. Li and M. Yu, “Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer,” J. Theor. Appl. Electron. Commer. Res., vol. 18, no. 4, pp. 1990–2006, 2023, doi: 10.3390/jtaer18040100.

Y. Rajalakshmi, T. Ammannamma, and D. Gudibandla, “Improving sales projections: a neural prophet-based approach for weekly forecasting,” in Futuristic Trends in Computing Technologies and Data Sciences Volume 3 Book 7, IIP Series, 2024, pp. 115–120. doi: 10.58532/v3bkct7p1ch10.

S. Jothiraj, S. I. Chellam, V. Rajeshwari, and C. K. Sri, “A Comprehensive Analysis of Predicting Future Sale and Forecasting Using Random Forest Regression,” in Industry Applications of Thrust Manufacturing: Convergence with Real-Time Data and AI, IGI Global Scientific Publishing, 2024, pp. 177–196. doi: 10.4018/979-8-3693-4276-3.ch007.

P. Ganguly and I. Mukherjee, “Enhancing Retail Sales Forecasting with Optimized Machine Learning Models,” in 2024 4th International Conference on Sustainable Expert Systems (ICSES), 2024, pp. 884–889. doi: 10.48550/arxiv.2410.13773.

X. X. Liang et al., “Product Sales Forecasting Model Driven by Multi-Source Data Integration Based on XGBoost,” in Frontiers in artificial intelligence and applications, A. J. Tallón-Ballesteros, Ed. 2024, pp. 166–179. doi: 10.3233/faia241416.

B. R. D. E. Oliveira, D. D. E. Simas, E. M. Frazzon, and M. Kück, “E-commerce sales forecasting: a product-based approach using consumer browsing data,” in ENEGEP 2024, 2024, pp. 1–13. doi: 10.14488/enegep2024_tn_wpg_413_2033_47846.

M. S. Chowdhury et al., “Optimizing E-Commerce Pricing Strategies: A Comparative Analysis of Machine Learning Models for Predicting Customer Satisfaction,” Am. J. Eng. Technol., vol. 06, no. 09, pp. 6–17, 2024, doi: 10.37547/tajet/volume06issue09-02.

L. Kumari, K. Bhattacharjee, N. Sharma, S. Kumar, and A. Kumari, “Machine Learning Models in Customer Behaviour Prediction: A Comparative Analysis,” in 2024 7th International Conference on Contemporary Computing and Informatics (IC3I), 2024, pp. 957–959. doi: 10.1109/ic3i61595.2024.10828637.

G. R. Shrivas, “A Study of Impact and Applications of Predictive Analytics in Sales Forecasting,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 2, no. 3, pp. 12–20, 2023, doi: 10.22214/ijraset.2023.57535.

J. E. Simarmata, G.-W. Weber, and D. Chrisinta, “Performance Evaluation of Classification Methods on Big Data: Decision Trees, Naive Bayes, K-Nearest Neighbors, and Support Vector Machines,” J. Mat. Stat. dan Komputasi, vol. 20, no. 3, pp. 623–638, 2024, doi: 10.20956/j.v20i3.32970.

Y. Oktafriani, G. Firmansyah, B. Tjahjono, and A. M. Widodo, “Analysis of Data Mining Applications for Determining Credit Eligibility Using Classification Algorithms C4.5, Naïve Bayes, K-NN, and Random Forest,” Asian J. Soc. Humanit., vol. 1, no. 12, pp. 1139–1158, 2023, doi: 10.59888/ajosh.v1i12.119.

R. Suryawanshi, S. Musale, and S. Bhosale, “Comparative Analysis of use of Machine Learning Algorithm for Prediction of Sales,” J. Electr. Syst., vol. 20, no. 3, pp. 851–863, 2024, doi: 10.52783/jes.1383.

X. Chang, “Comparative Analysis of Machine Learning, Decision Trees, and K-Nearest Neighbors for Heart Disease Prediction,” Appl. Comput. Eng., vol. 82, no. 1, pp. 188–192, 2024, doi: 10.54254/2755-2721/82/20241186.

E. Halabaku and E. Bytyçi, “Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests,” Intell. Autom. Soft Comput., vol. 39, no. 6, pp. 987–1006, 2024, doi: 10.32604/iasc.2024.059429.

M. Sheykhmousa, M. Mahdianpari, H. Ghanbari, F. Mohammadimanesh, P. Ghamisi, and S. Homayouni, “Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 13, no. 1, pp. 6308–6325, 2020, doi: 10.1109/JSTARS.2020.3026724.

G. Airlangga, “Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree,” J. Comput. Networks, Archit. High Perform. Comput., vol. 6, no. 1, pp. 430–438, 2024, doi: 10.47709/cnahpc.v6i1.3487.

D. S. AbdElminaam, M. A. Mohamed, S. Khaled, F. Hany, M. Magdy, and Y. Sherif, “Leveraging Machine Learning for Accurate Store Sales Prediction: A Comparative Study,” in 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2024, pp. 355–362. doi: 10.1109/miucc62295.2024.10783509.

W. Ma, “Advanced Analytics for Retail Inventory and Demand Forecasting,” Trans. Econ. Bus. Manag. Res., vol. 10, no. 1, pp. 113–119, 2024, doi: 10.62051/jme9b319.

U. Kulkarni et al., “Future Sales Prediction Using Regression and Deep Learning Techniques,” in Lecture Notes in Electrical Engineering, Singapore: Springer Science+Business Media, 2024, pp. 435–451. doi: 10.1007/978-981-99-7633-1_33.

M. U. Ashraf, “A Predictive Analysis of Retail Sales Forecasting using Machine Learning Techniques,” Lahore Garrison Univ. Res. J. Comput. Sci. Inf. Technol., vol. 6, no. 04, pp. 23–33, 2022, doi: 10.54692/lgurjcsit.2022.0604399.

B. Zhang, X. Qiao, H. Yang, and Z. Zhou, “A Random Forest Classification Model for Transmission Line Image Processing,” in International Conference on Computer Science and Education, 2020, pp. 613–617. doi: 10.1109/ICCSE49874.2020.9201900.

M. Maindola et al., “Utilizing Random Forests for High-Accuracy Classification in Medical Diagnostics,” in 2024 7th International Conference on Contemporary Computing and Informatics (IC3I), 2024, pp. 1679–1685. doi: 10.1109/ic3i61595.2024.10828609.

X. Shi, “The Application of Machine Learning in Online Purchasing Intention Prediction,” in International Conference on Big Data, 2021, pp. 21–29. doi: 10.1145/3469968.3469972.

R. Kasemrat and T. Kraiwanit, “Benchmarking Machine Learning Models for Predictive Analytics in E-Commerce,” Artif. Intell. eJournal, vol. 12, no. 10, p. 429, 2024, doi: 10.2139/ssrn.4832967.

S. M and D. N, “Regression Analysis-Based Predictive Model for E-Commerce Application,” in 2023 International Conference on Networking and Communications (ICNWC), 2023, pp. 1–7. doi: 10.1109/ICNWC57852.2023.10127390.

W. Zheng, “Customer Online Purchase Behavior Prediction and Performance Analysis Using Decision Tree and Random Forest,” Sci. Technol. Eng. Chem. Environ. Prot., vol. 1, no. 6, pp. 1–8, 2024, doi: 10.61173/pncab928.

A. Dorador, “Improving the Accuracy and Interpretability of Random Forests via Forest Pruning,” in Proceedings of Machine Learning Research (PMLR), 2024, vol. 1, p. 240. doi: 10.48550/arxiv.2401.05535.

A. Testas, “Random Forest Classification with Scikit-Learn and PySpark,” in Distributed Machine Learning with PySpark, Apress, 2023, pp. 243–258. doi: 10.1007/978-1-4842-9751-3_9.

B. Gulowaty and M. Wozniak, “Extracting Interpretable Decision Tree Ensemble from Random Forest,” in International Joint Conference on Neural Network, 2021, pp. 1–8. doi: 10.1109/IJCNN52387.2021.9533601.

B. Žlahtič, J. Završnik, H. B. Vošner, and P. Kokol, “Transferring Black-Box Decision Making to a White-Box Model,” Electronics, vol. 13, no. 10, pp. 1–16, 2024, doi: 10.3390/electronics13101895.

J. Jagannathan, A. K., N. Labhade-Kumar, R. Rastogi, M. V. Unni, and K. K. Baseer, “Developing interpretable models and techniques for explainable AI in decision-making,” Sci. Temper, vol. 14, no. 04, pp. 1324–1331, 2023, doi: 10.58414/scientifictemper.2023.14.4.39.

Jumanto et al., “Optimizing Support Vector Machine Performance for Parkinson’s Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 1, pp. 38–50, 2024, doi: 10.20473/jisebi.10.1.38-50.

M. Mittal, H. M. Al–Jawahry, N. Varshney, S. P. Kumar, J. J. Michaelson, and R. Reddy, “Improving Support Vector Machine Performance with Advanced Kernel Methods,” in 2024 7th International Conference on Contemporary Computing and Informatics (IC3I), 2024, pp. 1749–1754. doi: 10.1109/ic3i61595.2024.10828664.

P. J. B. Pajila, B. G. Sheena, A. Gayathri, J. Aswini, M. Nalini, and S. R, “A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications,” in 2023 4th International Conference on Smart Electronics and Communication (ICOSEC), 2023, pp. 1228–1234. doi: 10.1109/icosec58147.2023.10276274.

M. Schonlau, “The Naive Bayes Classifier,” in Applied Statistical Learning, Springer Nature, 2023, pp. 143–160. doi: 10.1007/978-3-031-33390-3_8.

D. Prabha, J. Aswini, B. Maheswari, R. Subramanian, R. Nithyanandhan, and P. N. Girija, “A Survey on Alleviating the Naive Bayes Conditional Independence Assumption,” in 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), 2022, pp. 654–657. doi: 10.1109/ICAISS55157.2022.10011103.

F. Acito, “k Nearest Neighbors,” in Predictive Analytics with KNIME, Springer Nature, 2023, pp. 209–227. doi: 10.1007/978-3-031-45630-5_10.

A. M.D and P. K.K, “Addressing K-Nn Limitations Through Boosted Multi-Algorithm Nearest Neighbour Ensembles,” in 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), 2024, pp. 1804–1809. doi: 10.1109/iccpct61902.2024.10673192.

Y. Gu, “A Comparative Analysis Study of Stock Prediction Based on Random Forest and Decision Tree,” in 2024 International Conference on Electronics and Devices, Computational Science (ICEDCS), 2024, pp. 96–100. doi: 10.1109/icedcs64328.2024.00022.

E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” J. Sist. Inf. dan Komput., vol. 13, no. 1, pp. 92–97, 2024, doi: 10.32736/sisfokom.v13i1.1985.

H.-H. Nguyen, “An Efficient Ensemble Algorithm for Boosting k-Nearest Neighbors Classification Performance via Feature Bagging,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 6, pp. 767–776, 2024, doi: 10.14569/ijacsa.2024.0150677.

H. K. Alghamdi, S. M. Omar, and H. Namankani, “Predicting the Customer Behaviour Utilizing Tree Based Machine Learning Algorithms,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 12, no. 11, pp. 125–130, 2023, doi: 10.17148/ijarcce.2023.121118.

M. R. Pahlawan, A. Setyanto, and M. Arief, “A Comprehensive Review of Clasifier used with Imbalanced Data in Machine Learning,” J. Electr. Eng. Comput., vol. 6, no. 1, pp. 177–185, 2024, doi: 10.33650/jeecom.v6i1.8510.

Additional Files

Published

2026-01-05

How to Cite

[1]
E. Purwanto, B. P. Cipto Utomo, H. Permatasari, and F. Mohd, “Comparative Analysis of Classification Models for Sales Prediction in E-commerce: Decision Tree, Random Forest, SVM, Naive Bayes, and KNN”, J. Tek. Inform. (JUTIF), vol. 6, no. 6, pp. 5899–5915, Jan. 2026.