Evaluating Lexicon Weighting and Machine Learning Models for Sentiment Classification of Indonesian Mangrove Ecotourism Reviews

Authors

  • Ferdi Chahyadi Informatics Engineering, Universitas Maritim Raja Ali Haji, Indonesia
  • Alena Uperiati Software Engineering Technology, Politeknik Negeri Batam, Indonesia
  • Risdy Absari Indah Pratiwi Digital Business, Universitas Maritim Raja Ali Haji, Indonesia
  • Nur Hamid Interdisciplinary Research Center for Smart Mobility and Logistics (IRC-SML), King Fahd University of Petroleum and Minerals, Saudi Arabia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.6.5563

Keywords:

Lexicon Weighting, Logistic Regression, Machine Learning, Mangrove Ecotourism, Sentiment Analysis, Support Vector Machine

Abstract

Sentiment analysis on ecotourism reviews presents specific challenges due to descriptive writing styles, the use of ambiguous words, and contextual meaning shifts (contextual polarity shift). These characteristics often cause lexicon-based approaches to produce unstable polarity labels. This study aims to evaluate the influence of two lexicon weighting methods, namely Mean Weighting and Summation Weighting, on the initial sentiment labeling of mangrove ecotourism reviews and to assess the performance of machine learning models trained using these labels. The research method includes text preprocessing, lexicon-based scoring using the InSet lexicon, feature extraction with Term Frequency–Inverse Document Frequency (TF–IDF), and the training of two classification algorithms, Support Vector Machine (SVM) and Logistic Regression (LR). The results show that the Mean Weighting method produces more stable polarity scores and higher model performance. The combination of SVM with Mean Weighting achieves the best results with an accuracy of 0.902, macro precision of 0.876, macro recall of 0.819, a macro F1-score of 0.841, and a weighted F1-score of 0.899. Meanwhile, LR with Mean Weighting reaches an accuracy of 0.891 with a similar performance pattern. In contrast, the Summation Weighting method results in lower performance for both algorithms. Error analysis indicates that neutral sentences and ambiguous words such as “bagus” and “ramai” frequently lead to misclassification. These findings highlight that the choice of lexicon weighting method plays a crucial role in improving sentiment classification accuracy and contributes to the development of hybrid approaches in text mining and sentiment analysis for the Indonesian language.

Downloads

Download data is not yet available.

References

A. R. Alaei, S. Becken, and B. Stantic, “Sentiment Analysis in Tourism: Capitalizing on Big Data,” J Travel Res, vol. 58, no. 2, pp. 175–191, Feb. 2019, doi: 10.1177/0047287517747753.

J. P. Mellinas and M. Sicilia, “Comparing Google reviews and TripAdvisor to help researchers select the more appropriate information source,” Consumer Behavior in Tourism and Hospitality, vol. 19, no. 4, pp. 646–655, Nov. 2024, doi: 10.1108/CBTH-01-2024-0039.

Eka, T. Saputra, and Wasiah Sufi, “PENGELOLAAN KAWASAN EKOWISATA HUTAN MANGROVE,” Multidisciplinary Indonesian Center Journal (MICJO), vol. 1, no. 4, pp. 1806–1812, Oct. 2024, doi: 10.62567/micjo.v1i4.205.

A. Novita and E. Mukhtar, “Review Article: Mangrove Ecotourism Development Potential,” International Journal of Progressive Sciences and Technologies, vol. 46, no. 2, p. 653, Sep. 2024, doi: 10.52155/ijpsat.v46.2.6534.

S. A. Parvin, M. Sumathi, and C. Mohan, “Challenges of Sentiment Analysis - A Survey,” in 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, Jun. 2021, pp. 781–786. doi: 10.1109/ICOEI51242.2021.9453026.

S. Almatarneh, I. A. Almatarneh, G. Samara, M. Aljaidi, A. Alamleh, and A. Abuawad, “Polarity Classification of Hotel Reviews: Lexicon-Based Method,” in 2022 International Arab Conference on Information Technology (ACIT), IEEE, Nov. 2022, pp. 1–4. doi: 10.1109/ACIT57182.2022.9994180.

O. Kellert, C. Gómez-Rodríguez, and M. Uz Zaman, “Unveiling factors influencing judgment variation in sentiment analysis with natural language processing and statistics,” PLoS One, vol. 19, no. 5, pp. 1–19, May 2024, doi: 10.1371/journal.pone.0304201.

A. Rufaida, A. Permanasari, and N. Setiawan, “Lexicon-Based Sentiment Analysis Using Inset Dictionary: A Systematic Literature Review,” in Proceedings of the 5th International Conference on Applied Engineering, ICAE 2022, 5 October 2022, Batam, Indonesia, EAI, 2023. doi: 10.4108/eai.5-10-2022.2327474.

F. T. Saputra, S. H. Wijaya, Y. Nurhadryani, and Defina, “Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter,” in 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), IEEE, Nov. 2020, pp. 136–141. doi: 10.1109/ICIMCIS51567.2020.9354269.

F. Akbar, . Hadiyanto, and C. E. Widodo, “Sentiment Analysis of Data on Google Maps Reviews Regarding Tourism on Keraton Kasepuhan Cirebon Using the Lexicon Based Method,” in Proceedings of the 3rd International Conference on Advanced Information Scientific Development, SCITEPRESS - Science and Technology Publications, 2023, pp. 19–24. doi: 10.5220/0012440100003848.

S. A. S. Mola, T. Widiastuti, R. V. K. I. O. Roma, A. S. Karnyoto, and B. Pardamean, “Sentiment Analysis: Indonesia Netflix User’s Comment Using Multiple Lexicon-Based Dictionaries,” in 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), IEEE, Dec. 2024, pp. 630–635. doi: 10.1109/ICICYTA64807.2024.10912916.

M. A. Aulia, B. Solihah, and A. Zuhdi, “Sentiment Analysis and Topic Modeling of Tourist Reviews on Bali Island Attractions on Tripadvisor Using Lexicon-Based Method and Latent Dirichlet Allocation (LDA),” Intelmatics, vol. 5, no. 1, pp. 1–7, Feb. 2025, doi: 10.25105/v5i1.17619.

W. F. Abdillah, A. Premana, and R. M. H. Bhakti, “Analisis Sentimen Penanganan Covid-19 dengan Support Vector Machine: Evaluasi Leksikon dan Metode Ekstraksi Fitur,” Jurnal Ilmiah Intech : Information Technology Journal of UMUS, vol. 3, no. 02, pp. 160–170, Nov. 2021, doi: 10.46772/intech.v3i02.556.

M. F. Alfauzan, Y. Sibaroni, and F. Fitriyani, “Sentiment Classification of Fuel Price Rise in Economic Aspects Using Lexicon and SVM Method,” sinkron, vol. 8, no. 4, pp. 2526–2536, Oct. 2023, doi: 10.33395/sinkron.v8i4.12851.

T. Hendrawati, N. L. W. S. R. Ginantra, and C. M. Saiman, “Analisis Sentimen Larangan Impor Pakaian Bekas Menggunakan Metode Support Vectore Machine dan Lexicon Based,” TEMATIK, vol. 11, no. 1, pp. 56–64, Jun. 2024, doi: 10.38204/tematik.v11i1.1890.

E. Lubihana and B. Y., “Design of a Tourism Recommendation System Based on Sentiment Analysis with Lexicon LSTM,” in 2022 International Symposium on Electronics and Smart Devices (ISESD), IEEE, Nov. 2022, pp. 1–6. doi: 10.1109/ISESD56103.2022.9980738.

E. Syahrul and D. Fatharani, “HYBRID SENTIMENT ANALYSIS OF MAXIM APP USERS USING SUPPORT VECTOR MACHINE AND LEXICON-BASED APPROACH,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 13, no. 3S1, Oct. 2025, doi: 10.23960/jitet.v13i3S1.8148.

E. F. Noviani, D. Purwitasari, and R. W. Sholikah, “Sentiment Analysis of Indonesian Temple Reviews Using Lexicon-Based Features and Stochastic Gradient Descent,” in 2023 International Conference on Information Technology and Computing (ICITCOM), IEEE, Dec. 2023, pp. 232–237. doi: 10.1109/ICITCOM60176.2023.10442938.

A. May Nggiri, F. Hariadi, and N. Berlian Uly, “Analysis of Visitor Sentiment to Matayangu Waterfall Tourism in Central Sumba Regency Using Naïve Bayes,” Journal of Artificial Intelligence and Engineering Applications (JAIEA), vol. 5, no. 1, pp. 397–404, Oct. 2025, doi: 10.59934/jaiea.v5i1.1333.

D. Khairani, A. Setiawan, and S. U. Masruroh, “Enhancing Understanding of Public Sentiment on Twitter Using SVM and Lexicon Methods,” in 2024 3rd International Conference on Creative Communication and Innovative Technology (ICCIT), IEEE, Aug. 2024, pp. 1–5. doi: 10.1109/ICCIT62134.2024.10701213.

M. K. Anam, T. A. Fitri, A. Agustin, L. Lusiana, M. B. Firdaus, and A. T. Nurhuda, “Sentiment Analysis for Online Learning using The Lexicon-Based Method and The Support Vector Machine Algorithm,” ILKOM Jurnal Ilmiah, vol. 15, no. 2, pp. 290–302, Aug. 2023, doi: 10.33096/ilkom.v15i2.1590.290-302.

A. Nadira, N. Y. Setiawan, and W. Purnomo, “ANALISIS SENTIMEN PADA ULASAN APLIKASI MOBILE BANKING MENGGUNAKAN METODE NAÏVE BAYES DENGAN KAMUS INSET,” INDEXIA, vol. 5, no. 01, p. 35, Apr. 2023, doi: 10.30587/indexia.v5i01.5138.

N. B. Bahadure et al., “Comparative Analysis of Polarity of Text-based Sentiment Analysis,” in 2024 3rd International Conference for Innovation in Technology (INOCON), IEEE, Mar. 2024, pp. 1–5. doi: 10.1109/INOCON60754.2024.10512041.

T. P. Sahu and S. Khandekar, “A Machine Learning-Based Lexicon Approach for Sentiment Analysis,” in Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines, IGI Global, 2022, pp. 836–851. doi: 10.4018/978-1-6684-6303-1.ch044.

F. Koto and G. Y. Rahmaningtyas, “Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs,” in 2017 International Conference on Asian Language Processing (IALP), IEEE, Dec. 2017, pp. 391–394. doi: 10.1109/IALP.2017.8300625.

D. H. Abd, A. R. Abbas, and A. T. Sadiq, “Analyzing sentiment system to specify polarity by lexicon-based,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 1, pp. 283–289, Feb. 2021, doi: 10.11591/eei.v10i1.2471.

M. D. Almeida, V. M. Maia, R. Tommasetti, and R. de O. Leite, “Sentiment analysis based on a social media customised dictionary,” MethodsX, vol. 8, p. 101449, 2021, doi: 10.1016/j.mex.2021.101449.

S. Sazzed, “Understanding Linguistic Variations in Neutral and Strongly Opinionated Reviews,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas: IEEE, Dec. 2022, pp. 1512–1516. doi: 10.1109/ICMLA55696.2022.00237.

S. Wang, G. Lv, S. Mazumder, and B. Liu, “Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 3657–3668. doi: 10.18653/v1/2021.findings-acl.320.

Y. Wang, F. Yin, J. Liu, and M. Tosato, “Automatic construction of domain sentiment lexicon for semantic disambiguation,” Multimed Tools Appl, vol. 79, no. 31–32, pp. 22355–22373, Aug. 2020, doi: 10.1007/s11042-020-09030-1.

Z. Jiang, Y. Zhang, C. Liu, J. Chen, J. Zhao, and K. Liu, “Interpreting Sentiment Composition with Latent Semantic Tree,” in Findings of the Association for Computational Linguistics: ACL 2023, Stroudsburg, PA, USA: Association for Computational Linguistics, 2023, pp. 7464–7478. doi: 10.18653/v1/2023.findings-acl.471.

Kavyasri. G, “Margin Maximization of Text Classification based on Support Vector Machine,” Int J Res Appl Sci Eng Technol, vol. 11, no. 12, pp. 789–792, Dec. 2023, doi: 10.22214/ijraset.2023.57420.

D. M. Ulya, J. Juhari, R. E. Yuliana, and M. Jamhuri, “Reliable and Efficient Sentiment Analysis on IMDb with Logistic Regression,” CAUCHY: Jurnal Matematika Murni dan Aplikasi, vol. 10, no. 2, pp. 821–834, Aug. 2025, doi: 10.18860/cauchy.v10i2.33809.

R. Kansal and C. Diwaker, “Efficiency Determination of Various Machine Learning Techniques for Sentiment Analysis on Social Media Platforms,” Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25584–25589, Aug. 2025, doi: 10.48084/etasr.11158.

I. Markoulidakis and G. Markoulidakis, “Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis,” Technologies (Basel), vol. 12, no. 7, p. 113, Jul. 2024, doi: 10.3390/technologies12070113.

Additional Files

Published

2025-12-23

How to Cite

[1]
F. Chahyadi, A. . Uperiati, R. A. I. . Pratiwi, and N. . Hamid, “Evaluating Lexicon Weighting and Machine Learning Models for Sentiment Classification of Indonesian Mangrove Ecotourism Reviews”, J. Tek. Inform. (JUTIF), vol. 6, no. 6, pp. 5679–5698, Dec. 2025.