Analyzing Marketplace Reviews Using Word2Vec, CNN, and Deep K-Means with Sociolinguistic Approaches

Authors

  • Fahry Computer Science Study Program, Universitas Bumigora, Indonesia
  • Titik Ceriyani Miswaty English Literature Study Program, Universitas Bumigora, Indonesia
  • Harun Business Study Program, James Cook University, Singapore

DOI:

https://doi.org/10.52436/1.jutif.2025.6.6.5340

Keywords:

Aspect-Based Sentiment Analysis, Convolutional Neural Network, Deep K-Means Clustering, Sociolinguistic Variation, Word2Vec Embedding

Abstract

This study investigates the effectiveness of deep learning methods in analyzing linguistically diverse customer reviews on Shopee to generate actionable product insights. By integrating Word2Vec, Convolutional Neural Networks (CNN), and Deep K-Means clustering, the proposed workflow moves beyond simple polarity detection toward aspect-based sentiment analysis. Customer reviews were preprocessed and represented using Word2Vec (skip-gram) to capture semantic proximity across informal registers, slang, abbreviations, and code-switching. A one-dimensional CNN then classified reviews into positive and negative sentiments, achieving 93–94% accuracy with balanced F1-scores across both classes. To extract aspect-level insights, reviews were projected into a latent space via an autoencoder and clustered using K-Means, with evaluation metrics (Silhouette ≈ 0.6; DBI ≈ 0.5) confirming adequate cohesion and separation. Positive clusters highlighted product design, durability, and ease of use, while negative clusters emphasized material quality, packaging, and delivery issues. These findings demonstrate that deep learning can adapt to sociolinguistic variation in Indonesian e-commerce discourse while providing structured, socially meaningful insights. This research is significant for the field of Informatics as it advances Natural Language Processing techniques for multilingual and code-switched data, addressing a key challenge in real-world text mining applications. The approach offers practical value for sellers in improving product quality, enhancing customer satisfaction, and refining marketing strategies.

Downloads

Download data is not yet available.

References

S. Bozkurt, D. Gligor, L. D. Hollebeek, and C. Sumlin, “Understanding the effects of firms’ unresponsiveness on social media toward customer feedback on customers’ engagement: the impact of ethnicity,” Journal of Research in Interactive Marketing, Jan. 2024, doi: 10.1108/JRIM-09-2023-0317.

A. D. Rachmadanty, A. Audiayani Muhtar, and A. Agustina, “Management Examining the Impact of Gamification and Customer Experience on Customer Loyalty in E-commerce: Mediating Role of Customer Satisfaction,” Journal of Enterprise and Development (JED), vol. 7, no. 1, p. 2025, 2025.

A. Yuille and U. Mir, “Causal Inference in Customer Feedback Analysis: A Benchmarking Approach with LLMs,” Journal of Economic in Globalization, vol. 20, no. 3, pp. 12–24, 2025, doi: 10.13140/RG.2.2.11856.52486.

N. Ruytenbeek, S. Decock, and I. Depraetere, “The impact of linguistic choices and (para-)linguistic markers on the perception of Twitter complaints by other customers: an experimental approach,” Journal of Politeness Research, vol. 19, no. 1, pp. 87–122, 2023, doi: doi:10.1515/pr-2021-0031.

A. Rajuroy, “Sentiment Analysis of Amazon Product Reviews Using Machine Learning Approaches”, [Online]. Available: https://www.researchgate.net/publication/394832633

K. Ollivier, C. Boldrini, A. Passarella, and M. Conti, “ Unveiling Cognitive Constraints in Language Production: Extracting and Validating the Active Ego Network of Words ,” IEEE Trans Comput Soc Syst, pp. 1–13, Jul. 2024, doi: 10.1109/tcss.2024.3419565.

H. K. Smith, “Social Media Marketing and Digital Presence: Entrepreneurial Approaches in Oman’s Logistics Sector,” Journal of Business Digital, vol. 12, no. 1, pp. 146–148, 2025, [Online]. Available: https://www.researchgate.net/publication/389451377

S. Salloum, O. Khasoneh, R. Abousamra, and A. AlHamad, Generative AI in Advertising and Marketing: A BERT-Based Analysis of Text Descriptions for Product Advertisement. Willey, 2025. doi: 10.1007/978-3-031-89175-5_17.

Fahry and E. Utami, “Optimizing Sentiment Analysis Of Product Reviews On Marketplace Using A Combination Of Preprocessing Techniques, Word2vec, And Convolutional Neural Network Optimisasi Analisis Sentimen Ulasan Produk Pada Marketplace Dengan Kombinasi Teknik Preprocessing, Word2vec, Dan Convolutional Neural Network,” Jurnal Teknik Informatika (JUTIF), vol. 4, no. 1, pp. 101–107, 2023, doi: 10.20884/1.jutif.2023.4.1.815.

R. Bharathi, R. Bhavani, and R. Priya, “Leveraging deep learning with sentiment analysis for Online Book reviews polarity classification model,” Multimed Tools Appl, vol. 84, pp. 29843–29862, Sep. 2024, doi: 10.1007/s11042-024-20369-7.

J. A. Josen Limbong, I. Sembiring, K. Dwi Hartomo, U. Kristen Satya Wacana, and P. Korespondensi, “Analisis Klasifikasi Sentimen Ulasan pada E-Commerce Shopee Berbasis Word Cloud Dengan Metode Naive Bayes dan K-Nearest Neighbor Analysis of Review Sentiment Classification On E-Commerce Shopee Word Cloud Based With Naive Bayes and K-Nearest Neighbor Method,” Journal of Multi, 2024, doi: 10.25126/jtiik.202294960.

M. Schoop, “Negotiation communication revisited,” Cent Eur J Oper Res, vol. 29, no. 1, pp. 163–176, Mar. 2021, doi: 10.1007/s10100-020-00730-5.

E. Diegoli, “A corpus-assisted analysis of indexical signs for (im)politeness in Japanese apology-like behaviour,” Journal of Politeness Research, vol. 20, no. 2, pp. 427–453, 2024, doi: doi:10.1515/pr-2022-0002.

I. Hübscher, C. Sánchez-Conde, J. Borràs-Comes, L. Vincze, and P. Prieto, “Multimodal mitigation: how facial and body cues index politeness in Catalan requests,” Journal of Politeness Research, vol. 19, no. 1, pp. 1–29, 2023, doi: doi:10.1515/pr-2020-0033.

Z. Yang, Y. Du, D. Liu, K. Zhao, and M. Cong, “A human-robot interaction system for automated chemical experiments based on vision and natural language processing semantics,” Eng Appl Artif Intell, vol. 146, p. 110226, 2025, doi: https://doi.org/10.1016/j.engappai.2025.110226.

M. B. M. Hansen, “Social meaning as Hearer’s Meaning: Integrating social meaning into a general theory of meaning in communication,” J Pragmat, vol. 241, pp. 81–91, May 2025, doi: 10.1016/j.pragma.2025.03.010.

F. González, M. Torres-Ruiz, G. Rivera-Torruco, L. Chonona-Hernández, dan R. Quintero, “A Natural-Language-Processing-Based Method for the Clustering and Analysis of Movie Reviews and Classification by Genre,” Mathematics, vol. 11, no. 23, p. 4735, 2023, doi:10.3390/math11234735.

Q. Xu, “Text clustering based on pre-trained models and deep embedding clustering,” Frontiers in Computational Neuroscience, 2024, doi: 10.3389/fncom.2023.1334436.

P. Vijayaragavan et al., “Transforming sentiment analysis for e-commerce product reviews: Hybrid deep learning model with an innovative term weighting and feature selection,” Information Fusion, 2024

J. A. Aguilar-Moreno, P. R. Palos-Sanchez, dan R. Pozo-Barajas, “Sentiment analysis to support business decision-making. A bibliometric study,” AIMS Mathematics, vol. 9, no. 2, pp. 4337-4375, 2024, doi: 10.3934/math.202421

S. Al-Hadhrami, T. Vinko, T. Al-Hadhrami, F. Saeed, dan S. N. Qasem, “Deep learning-based method for sentiment analysis for patients’ drug reviews,” PLoS ONE

M. J. Verdú et al., “Clustering of LMS Use Strategies with Autoencoders,” Applied Sciences, 2023.

Y. Kim, “Convolutional Neural Networks for Sentence Classification,” arXiv preprint arXiv:1408.5882, 2014. [Online]. Available: https://arxiv.org/abs/1408.5882

A. H. M. Azam, “Asymmetric Effect of Market Sentiment on Banking: A Nonlinear ARDL Approach,” Jurnal Ekonomi Malaysia, vol. 58, no. 1, pp. 1– 15, 2024, doi: 10.17576/JEM-2024-5801-08.

S. Bozkurt, D. Gligor, L. D. Hollebeek, and C. Sumlin, “Understanding the effects of firms’ unresponsiveness on social media toward customer feedback on customers’ engagement: the impact of ethnicity,” Journal of Research in Interactive Marketing, Jan. 2024, doi: 10.1108/JRIM-09-2023- 0317

A. Gülbaşı and E. Taşkın, “The Two Faces Of E-Commerce: A Comparison Of ECommerce Platforms And Social Commerce,” Dumlupınar Üniversitesi İİBF Dergisi, vol. 2, no. 14, pp. 71–82, Dec. 2024, doi: 10.58627/dpuiibf.1535413.

N. L. Putri, B. Warsito, and B. Surarso, “Pengaruh Klasifikasi Sentimen Pada Ulasan Produk Amazon Berbasis Rekayasa Fitur dan K-Nearest Negihbor,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 1, pp. 65–74, Feb. 2024, doi: 10.25126/jtiik.20241117376

D. Safryda Putri et al., “Analisis Sentimen dan Pemodelan Ulasan Aplikasi AdaKami Menggunakan Algoritma SVM dan KNN,” Jurnal Swabumi, vol. 1, no. 3, 2025.

M. Irfani and S. Khomsah, “Analisis Sentimen Berbasis Aspek pada EDOM Pembelajaran Menggunakan Metode CNN dan Word2vec,” Jurnal Sistem dan Teknologi Informasi (JustIN), vol. 12, no. 3, p. 413, Jul. 2024, doi: 10.26418/justin.v12i3.75610.

E. Y. Hidayat and D. Handayani, “Penerapan 1D-CNN untuk Analisis Sentimen Ulasan Produk Kosmetik Berdasar Female Daily Review,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 8, no. 3, pp. 153–163, Jan. 2023, doi: 10.25077/teknosi.v8i3.2022.153-163.

A. Hanafiah et al., “Sentimen Analisis Terhadap Customer Review Produk Shopee Berbasis Wordcloud Dengan Algoritma Naïve Bayes Classifier Sentiment Analysis Of Customer Reviews Of Shopee Products Based On Wordcloud Using Naïve Bayes Classifier Algorithm,” Journal of Information Technology and Computer Science (INTECOMS), vol. 6, no. 1, pp. 23–31, 2023.

G. Andrasthea and H. Februariyanti, “Analisis Sentimen Ulasan Herborist Sistem Pengambilan Keputusan Menggunakan Klasifikasi Neighbor dan TF- ID,” Jurnal Swabumi, vol. 12, no. 2, pp. 176–181, 2024.

M. H. Mekkawi, “The Implication of AI on E-Commerce,” International Journal of Legal and Social Order, vol. 4, no. 1, Nov. 2024, doi: 10.55516/ijlso.v4i1.201.

J. A. Putra, A. Dharmawan, and J. Gondohanindijo, “Sentimen Analysis Digitalent Mobile Application Using Naïve Bayes And Svm With Tf-Idf Fitur Extraction,” Journal of Information Technology and Computer Science (INTECOMS), vol. 7, no. 4, 2024.

A. Kumar, D. R. P. M. Vincent, K. Srinivasan, and C. Y. Chang, “Deep Convolutional Neural Network based Feature Extraction with optimized Machine Learning Classifier in Infant Cry Classification,” in 2020 International Conference on Decision Aid Sciences and Application, DASA 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020, pp. 27–32. doi: 10.1109/DASA51403.2020.9317240.

S. Purkovic et al., “Audio analysis with convolutional neural networks and boosting algorithms tuned by metaheuristics for respiratory condition classification,” Journal of King Saud University -Computer and Information Sciences, vol. 36, no. 10, Dec. 2024, doi: 10.1016/j.jksuci.2024.102261

L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa, “Data augmentation techniques in natural language processing,” Appl Soft Comput, vol. 132, Jan. 2023, doi: 10.1016/j.asoc.2022.109803.

A. A. Abdulnassar and L. R. Nair, “Performance analysis of Kmeans with modified initial centroid selection algorithms and developed Kmeans9+ model,” Meas. Sensors, vol. 25, no. 100666, 2023, doi: 10.1016/j.measen.2023.10066

A. Alamsyah et al., “Customer Segmentation Using the Integration of the Recency Frequency Monetary Model and the K-Means Cluster Algorithm,” Sci. J. Informatics, vol. 9, no. 2, pp. 189–196, 2022, doi: 10.15294/sji.v9i2.39437.

M. Al Ghifari and W. T. Harsanti Putri, “Clustering Courses Based On Student Grades Using K-Means Algorithm With Elbow Method For Centroid Determination,” Inf. J. Ilm. Bid. Teknol. Inf. dan Komun., vol. 8, no. 1, pp. 42–46, 2023, doi: 10.25139/inform.v8i1.4519.

Additional Files

Published

2025-12-23

How to Cite

[1]
F. Fahry, T. C. Miswaty, and H. Harun, “Analyzing Marketplace Reviews Using Word2Vec, CNN, and Deep K-Means with Sociolinguistic Approaches”, J. Tek. Inform. (JUTIF), vol. 6, no. 6, pp. 5489–5502, Dec. 2025.