Evaluating SMOTE Performance for Imbalanced Multi-Label Sentiment Classification in MLSE Usability Testing of Mobile App Reviews

Hasan  Basri; Wahyu Noviani Purwanti; Ihsan  Alparisi

doi:10.52436/1.jutif.2026.7.2.5351

Authors

Hasan Basri Information Systems, Faculty of Science and Technology, Universitas Terbuka, Indonesia
Wahyu Noviani Purwanti Information Systems, Faculty of Science and Technology, Universitas Terbuka, Indonesia
Ihsan Alparisi Information Systems, Faculty of Science and Technology, Universitas Terbuka, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5351

Keywords:

Imbalanced Data, Multi-label Classification, Sentiment Analysis, SMOTE, Usability Testing

Abstract

Imbalanced data poses a significant challenge in multi-label classification tasks, especially when combining sentiment analysis with usability testing of mobile application reviews. This study investigates the effectiveness of the Synthetic Minority Over-sampling Technique (SMOTE) in improving classification performance on a multi-label dataset consisting of 10,000 Indonesian language user reviews from the Google Play store. The classification labels represent a combination of usability criteria and sentiment polarity, with strong imbalance observed across several classes. Three machine learning algorithms SVM, Decision Tree, and Random Forest were evaluated on datasets of increasing sizes (1,000 to 10,000 entries), each tested under both original and SMOTE-balanced conditions using stratified 10-fold cross-validation with accuracy and F1-score as the primary metrics. Experimental results show that SMOTE significantly improves the performance of Decision Tree mainly on smaller datasets but exhibits inconsistent gains as the dataset grows, provides modest and stable improvements for Random Forest, and negatively impacts SVM, whose performance remains consistently better without SMOTE. This study concludes that SMOTE is not a universally effective solution and must be applied selectively based on model characteristics. These findings contribute to the Machine Learning for Software Engineering (ML4SE) domain and the field of informatics by highlighting the importance of aligning resampling techniques with algorithmic behaviour when dealing with highly imbalanced multi-label text classification tasks.

Downloads

Download data is not yet available.

References

A. R. Akbar, M. G. H. Fikri, N. D. Putra, Sunardi, and A. Hairuman, “Comparing the User Experience of Mobile Banking Applications Using System Usability Scale and Usability Testing,” in 2024 9th International Conference on Business and Industrial Research (ICBIR), IEEE, May 2024, pp. 0106–0111. https://doi.org/10.1109/ICBIR61386.2024.10875913.

S. Gottschalk, F. Rittmeier, and G. Engels, “Intertwined Development of Business Model and Product Functions for Mobile Applications: A Twin Peak Feature Modeling Approach,” 2019, pp. 192–207. https://doi.org/10.1007/978-3-030-33742-1_16.

M. K. Uddin, H. Qiang, H. Jun, and C. Caslon, “Feature Recommendation by Mining Updates and User Feedback from Competitor Apps,” in MobiQuitous 2020 - 17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, New York, NY, USA: ACM, Dec. 2020, pp. 18–28. https://doi.org/10.1145/3448891.3448953.

G. T. Roy and D. Biswas, “Exploring Transformer and Recurrent Neural Models for Sentiment Analysis of Mobile App Reviews,” in 2024 2nd International Conference on Information and Communication Technology (ICICT), IEEE, Oct. 2024, pp. 209–213. https://doi.org/10.1109/ICICT64387.2024.10839678.

H. Basri, M. B. S. Junianto, and I. Kusyadi, “Enhancing Usability Testing Through Sentiment Analysis: A Comparative Study Using SVM, Naive Bayes, Decision Trees and Random Forest,” J. Teknol. Sist. Inf. dan Apl., vol. 7, no. 4, pp. 1603–1610, Oct. 2024, https://doi.org/10.32493/jtsi.v7i4.45117.

H. Basri, “OPTIMIZING SENTIMENT ANALYSIS FOR USABILITY TESTING: ENHANCING SVM ACCURACY THROUGH KERNEL SELECTION AND TUNING METHODS,” MULTITEK Indones., vol. 18, no. 2, pp. 105–113, Jan. 2025, https://doi.org/10.24269/mtkind.v18i2.10615.

Y. Huang, B. Giledereli, A. Köksal, A. Özgür, and E. Ozkirimli, “Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 8153–8161. https://doi.org/10.18653/v1/2021.emnlp-main.643.

M. Bhattacharjee, K. Ghosh, A. Banerjee, and S. Chatterjee, “Multilabel Sentiment Prediction by Addressing Imbalanced Class Problem Using Oversampling,” 2021, pp. 239–249. https://doi.org/10.1007/978-981-15-9433-5_23.

K. R. M. Fernando and C. P. Tsokos, “Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 7, pp. 2940–2951, Jul. 2022, https://doi.org/10.1109/TNNLS.2020.3047335.

Y. Xu, H. Ye, N. Zhang, and G. Du, “Leveraging Autoencoder and Focal Loss for Imbalanced Data Classification,” in 2022 12th International Conference on Information Technology in Medicine and Education (ITME)v, IEEE, Nov. 2022, pp. 502–506. https://doi.org/10.1109/ITME56794.2022.00110.

M. Abdelhamid and A. Desai, “Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification,” Sep. 2024, https://doi.org/10.48550/arXiv.2409.19751.

K. S. Raslan, A. S. Alsharkawy, and K. R. Raslan, “iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification,” Apr. 2025, https://doi.org/10.48550/arXiv.2504.12850.

S. A. Alex and J. J. V. Nayahi, “Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network,” Int. J. Uncertainty, Fuzziness Knowledge-Based Syst., vol. 31, no. 03, pp. 437–469, Jun. 2023, https://doi.org/10.1142/S0218488523500228.

M. Hadwan, M. Al-Sarem, F. Saeed, and M. A. Al-Hagery, “An Improved Sentiment Classification Approach for Measuring User Satisfaction toward Governmental Services’ Mobile Apps Using Machine Learning Methods with Feature Engineering and SMOTE Technique,” Appl. Sci., vol. 12, no. 11, p. 5547, May 2022, https://doi.org/10.3390/app12115547.

S. F. Taskiran, B. Turkoglu, E. Kaya, and T. Asuroglu, “A comprehensive evaluation of oversampling techniques for enhancing text classification performance,” Sci. Rep., vol. 15, no. 1, p. 21631, Jul. 2025, https://doi.org/10.1038/s41598-025-05791-7.

K. Kaur and P. Kaur, “MNoR-BERT: multi-label classification of non-functional requirements using BERT,” Neural Comput. Appl., vol. 35, no. 30, pp. 22487–22509, Oct. 2023, https://doi.org/10.1007/s00521-023-08833-1.

N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study1,” Intell. Data Anal., vol. 6, no. 5, pp. 429–449, Nov. 2002, https://doi.org/10.3233/IDA-2002-6504.

S. S. Rawat and A. K. Mishra, “Review of Methods for Handling Class-Imbalanced in Classification Problems,” Nov. 2022, https://doi.org/10.48550/arXiv.2211.05456.

J. Chen and S. Li, “Class-aware Learning for Imbalanced Multi-Label Classification,” in 2023 IEEE 5th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), IEEE, Oct. 2023, pp. 903–907. https://doi.org/10.1109/ICCASIT58768.2023.10351721.

R. Rastogi and S. Mortaza, “Imbalance multi-label data learning with label specific features,” Neurocomputing, vol. 513, pp. 395–408, Nov. 2022, https://doi.org/10.1016/j.neucom.2022.09.085.

G. Du et al., “Graph-Based Class-Imbalance Learning With Label Enhancement,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 9, pp. 6081–6095, Sep. 2023, https://doi.org/10.1109/TNNLS.2021.3133262.

X. Li and Q. Liu, “DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering,” J. Supercomput., vol. 80, no. 12, pp. 17760–17789, Aug. 2024, https://doi.org/10.1007/s11227-024-06132-7.

W.-C. Cheng, T.-H. Mai, and H.-T. Lin, “From SMOTE to Mixup for Deep Imbalanced Classification,” Nov. 2023, https://doi.org/10.48550/arXiv.2308.15457

A. Li, T. Ma, S. Ye, and X. Liu, “SMOTE-IF: A Novel Resampling Method Based on SMOTE Using Isolation Forest Variants for Multi-Class Imbalanced Data,” in 2023 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cyberma, IEEE, Dec. 2023, pp. 570–577. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics60724.2023.00107.

H. Guan, Y. Zhang, M. Xian, H. D. Cheng, and X. Tang, “SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling,” Appl. Intell., vol. 51, no. 3, pp. 1394–1409, Mar. 2021, https://doi.org/10.1007/s10489-020-01852-8.

C. Srinilta and S. Kanharattanachai, “Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms,” in 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), IEEE, Apr. 2021, pp. 217–220. https://doi.org/10.1109/ICEAST52143.2021.9426310.

D. Ruiz Alonso, C. Zepeda Cortés, H. Castillo Zacatelco, J. L. Carballido Carranza, and J. L. García Cué, “Multi-label classification of feedbacks,” J. Intell. Fuzzy Syst., vol. 42, no. 5, pp. 4337–4343, Mar. 2022, https://doi.org/10.3233/JIFS-219224.

J. A. Ferreira Costa, E. D. de S. A. Silva, S. de O. Silva, and N. C. D. Dantas, “Multi-Label Classification of Legal Cases According to the Sustainable Development Goals Using Machine Learning Algorithms,” in 2024 IEEE Latin American Conference on Computational Intelligence (LA-CCI), IEEE, Nov. 2024, pp. 1–6. https://doi.org/10.1109/LA-CCI62337.2024.10814740.

S. E. Latha V, Chandre S, “Sentiment Analysis for User Reviews Based on Improved Binarization Aquila Optimization with Self-Attention Bi-LSTM Model,” Int. J. Intell. Eng. Syst., vol. 17, no. 5, pp. 813–824, Oct. 2024, https://doi.org/10.22266/ijies2024.1031.61.

H. Candra, E. D. Madyatmadja, J. Nathaniel, and M. R. Jonathan, “Sentiment Analysis on Indonesian Telegram Reviews Using Naïve Bayes, SVM, Random Forest, and Boosting Models,” in 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, Aug. 2024, pp. 493–498. https://doi.org/10.1109/ICITISEE63424.2024.10730718.

Y. Fauziah, B. Yuwono, and A. S. Aribowo, “Lexicon Based Sentiment Analysis in Indonesia Languages : A Systematic Literature Review,” RSF Conf. Ser. Eng. Technol., vol. 1, no. 1, pp. 363–367, Dec. 2021, https://doi.org/10.31098/cset.v1i1.397.

S. Khomsah and Agus Sasmito Aribowo, “Text-Preprocessing Model Youtube Comments in Indonesian,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 4, pp. 648–654, Aug. 2020, doi: 10.29207/resti.v4i4.2035.

V. D. Antonio, S. Efendi, and H. Mawengkang, “Sentiment analysis for covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent,” Int. J. Nonlinear Anal. Appl., vol. 13, no. 1, pp. 1367–1373, 2022, https://doi.org/10.22075/ijnaa.2021.5735.

A. S. Safitri, I. Wijayanto, and S. Hadiyoso, “Improving Classification Accuracy With Preprocessing Techniques For Sentiment Analysis,” in 2024 International Conference on Data Science and Its Applications (ICoDSA), IEEE, Jul. 2024, pp. 487–490. https://doi.org/10.1109/ICoDSA62899.2024.10651657.

F. Kamalov, S. E. Choutri, and A. F. Atiya, “Analytical formulation of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Gulf J. Math., vol. 19, no. 1, pp. 400–415, Jan. 2025, https://doi.org/10.56947/gjom.v19i1.2639.

O. Kachan, A. Savchenko, and G. Gusev, “Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, New York, NY, USA: ACM, Jul. 2025, pp. 625–635. https://doi.org/10.1145/3690624.3709268.

B. Talekar, “A Detailed Review on Decision Tree and Random Forest,” Biosci. Biotechnol. Res. Commun., vol. 13, no. 14, pp. 245–248, Dec. 2020, https://doi.org/10.21786/bbrc/13.14/57.

R. G. McClarren, “Decision Trees and Random Forests for Regression and Classification,” in Machine Learning for Engineers, Cham: Springer International Publishing, 2021, pp. 55–82. https://doi.org/10.1007/978-3-030-70388-2_3.

V. Lumumba, D. Kiprotich, M. Mpaine, N. Makena, and M. Kavita, “Comparative Analysis of Cross-Validation Techniques: LOOCV, K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models,” Am. J. Theor. Appl. Stat., vol. 13, no. 5, pp. 127–137, Oct. 2024, https://doi.org/10.11648/j.ajtas.20241305.13.