SENTIMENT ANALYSIS CLASSIFICATION IN WOMEN'S E-COMMERCE REVIEWS WITH MACHINE LEARNING APPROACH

  • Alfiki Diastama Afan Firdaus Faculty of Technology Information and Data Science, Universitas Sebelas Maret, Indonesia
  • Rizki Dwi Rahmawan Faculty of Technology Information and Data Science, Universitas Sebelas Maret, Indonesia
  • Yuzzar Rizky Mahendra Faculty of Technology Information and Data Science, Universitas Sebelas Maret, Indonesia
  • Hasan Dwi Cahyono Faculty of Technology Information and Data Science, Universitas Sebelas Maret, Indonesia
Keywords: k-nearest neighbour, Naive Bayes, sentiment analysis, support vector machine, user reviews

Abstract

User reviews on e-commerce are one of the important elements in e-commerce. User reviews can help potential buyers make decisions based on the experiences and opinions of other people, for example women's e-commerce reviews. In providing positive, neutral or negative sentiment reviews, understanding customer perceptions is challenging. Classifying sentiment reviews will solve this problem, several classification techniques have been carried out, but there is still room for development in the use of simple machine learning techniques and sampling to overcome data class imbalance. Classification techniques used in this paper include Naive Bayes, SVM, and KNN. These algorithms will be compared to determine the most accurate model. Several preprocessing techniques are also carried out such to balance the dataset using ROS and SMOTE. It was obtained that the SVM method with ROS had the highest accuracy of around 0.94 for accuracy value, 0.93 for precision value, 0.94 for recall, and 0.92 for F1-score value. This research shows that the use of sampling techniques such as ROS and SMOTE can be effective in balancing imbalanced datasets, thereby improving model classification performance. These findings can be a reference for developing more efficient and accurate sentiment classification models, especially in the case of imbalanced data.

Downloads

Download data is not yet available.

References

A. F. Agarap, “Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network (RNN),” Jun. 16, 2020, arXiv: arXiv:1805.03687. Accessed: Jun. 02, 2024. [Online]. Available: http://arxiv.org/abs/1805.03687

S. Sharma and S. Kumar, “Insights into the Impact of Online Product Reviews on Consumer Purchasing Decisions: A Survey-based Analysis of Brands’ Response Strategies,” Sch. Int. J. Manag. Dev. ISSN 2394-3378, vol. 10, no. 1, p. 1, Oct. 2023, doi: 10.19085/sijmd100101.

A. Noor and M. Islam, “Sentiment Analysis for Women’s E-commerce Reviews using Machine Learning Algorithms,” in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Jul. 2019, pp. 1–6. doi: 10.1109/ICCCNT45670.2019.8944436.

“A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews,” Comput. Sci. Rev., vol. 41, p. 100413, Aug. 2021, doi: 10.1016/j.cosrev.2021.100413.

V. Umarani, A. Julian, and J. Deepa, “Sentiment Analysis using various Machine Learning and Deep Learning Techniques,” J. Niger. Soc. Phys. Sci., pp. 385–394, Nov. 2021, doi: 10.46481/jnsps.2021.308.

N. M. Alharbi, N. S. Alghamdi, E. H. Alkhammash, and J. F. Al Amri, “Evaluation of Sentiment Analysis via Word Embedding and RNN Variants for Amazon Online Reviews,” Math. Probl. Eng., vol. 2021, pp. 1–10, May 2021, doi: 10.1155/2021/5536560.

B. Agarwal, V. Sharma, P. Harjule, V. Tiwari, and A. Sharma, “Chapter 7 - The COVID-19 outbreak: social media sentiment analysis of public reactions with a multidimensional perspective,” in Cyber-Physical Systems, R. C. Poonia, B. Agarwal, S. Kumar, M. S. Khan, G. Marques, and J. Nayak, Eds., Academic Press, 2022, pp. 117–138. doi: 10.1016/B978-0-12-824557-6.00013-3.

“Sentiment analysis methods, applications, and challenges: A systematic literature review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 4, p. 102048, Apr. 2024, doi: 10.1016/j.jksuci.2024.102048.

J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, p. 27, Mar. 2019, doi: 10.1186/s40537-019-0192-5.

“Women’s E-Commerce Clothing Reviews.” Accessed: Jun. 02, 2024. [Online]. Available: https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews

Md. S. Islam et al., “Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach,” Artif. Intell. Rev., vol. 57, no. 3, p. 62, Mar. 2024, doi: 10.1007/s10462-023-10651-9.

A. Rai and S. Borah, “Study of Various Methods for Tokenization,” in Applications of Internet of Things, J. K. Mandal, S. Mukhopadhyay, and A. Roy, Eds., Singapore: Springer, 2021, pp. 193–200. doi: 10.1007/978-981-15-6198-6_18.

S.-W. Kim and J.-M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Hum.-Centric Comput. Inf. Sci., vol. 9, no. 1, p. 30, Aug. 2019, doi: 10.1186/s13673-019-0192-7.

“An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,” Appl. Soft Comput., vol. 83, p. 105662, Oct. 2019, doi: 10.1016/j.asoc.2019.105662.

D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci., vol. 505, pp. 32–64, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach. Learn., vol. 113, no. 7, pp. 4903–4923, Jul. 2024, doi: 10.1007/s10994-022-06296-4.

M. Şi̇Mşek and A. S. Daş, “The Effect of Handling Imbalanced Datasets Methods on Prediction of Entrepreneurial Competency in University Students,” Turk. J. Forecast., vol. 06, no. 2, pp. 53–60, Dec. 2022, doi: 10.34110/forecasting.1185545.

T. Hasanin, T. M. Khoshgoftaar, J. L. Leevy, and R. A. Bauder, “Severely imbalanced Big Data challenges: investigating data sampling approaches,” J. Big Data, vol. 6, no. 1, p. 107, Nov. 2019, doi: 10.1186/s40537-019-0274-4.

Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” J-Icon J. Komput. Dan Inform., vol. 10, no. 1, Art. no. 1, Mar. 2022, doi: 10.35508/jicon.v10i1.6554.

K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India: IEEE, May 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.

M. T. Khan, M. Durrani, A. Ali, I. Inayat, S. Khalid, and K. H. Khan, “Sentiment analysis and the complex natural language,” Complex Adapt. Syst. Model., vol. 4, no. 1, p. 2, Dec. 2016, doi: 10.1186/s40294-016-0016-9.

M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, p. 100071, Jun. 2022, doi: 10.1016/j.dajour.2022.100071..

Published
2024-12-28
How to Cite
[1]
A. D. Afan Firdaus, R. D. Rahmawan, Y. R. Mahendra, and H. D. Cahyono, “SENTIMENT ANALYSIS CLASSIFICATION IN WOMEN’S E-COMMERCE REVIEWS WITH MACHINE LEARNING APPROACH”, J. Tek. Inform. (JUTIF), vol. 5, no. 6, pp. 1549-1559, Dec. 2024.