Improving the Performance of Machine Learning Classifiers in Sentiment Analysis of Jenius Application Using Latent Dirichlet Allocation in Text Preprocessing

Vincentius Riandaru Prasetyo; Njoto Benarkah; Bayu Aji Hamengku Rahmad

doi:10.52436/1.jutif.2025.6.5.5238

Authors

Vincentius Riandaru Prasetyo Department of Informatics Engineering, University of Surabaya, Surabaya, Indonesia
Njoto Benarkah Department of Informatics Engineering, University of Surabaya, Surabaya, Indonesia
Bayu Aji Hamengku Rahmad Department of Informatics Engineering, University of Surabaya, Surabaya, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.5238

Keywords:

Digital Banking, Jenius Application, Latent Dirichlet Allocation, Machine Learning Classifiers, Sentiment Analysis, Text Preprocessing

Abstract

Sentiment analysis aims to classify a person’s opinion into a specific sentiment, such as positive or negative. The choice of preprocessing used can influence the performance of a sentiment analysis model. The Latent Dirichlet Allocation (LDA) method, commonly used for topic modelling, can be employed as an additional preprocessing step to identify relevant words associated with a particular sentiment label. This study aims to assess whether the LDA method, implemented in the preprocessing stage, can enhance the performance of machine learning models, including Naïve Bayes, Decision Tree, KNN, Logistic Regression, and SVM. This study utilized a dataset comprising 1,800 reviews, with 900 labelled as positive and 900 as negative. Words with an LDA score of at least 0.15 were given additional weight in the TF-IDF stage before model training. After the model was developed, evaluation was carried out by calculating accuracy, precision, recall, and F1-score. The use of LDA in preprocessing improved the performance of all classification models by 1-3% across most evaluation metrics. Specifically, the Logistic Regression model achieved the best performance, followed by SVM and KNN. This performance improvement is aligned with the use of LDA to reduce semantic noise and improve feature representation. Furthermore, this research is also helpful for monitoring customer opinions in the digital banking sector, enabling the rapid and accurate identification of priority issues. Further research could explore the comparison of performance with other topic modelling and feature extraction methods, as well as expanding the dataset and utilizing multiclass models.

Downloads

Download data is not yet available.

References

C. L. Rithmaya, H. Ardianto and E. Sistiyarini, “Gen Z and The Future of Banking: An Analysis of Digital Banking Adoption,” Jurnal Manajemen Dan Kewirausahaan, vol. 26, no. 1, pp. 64–78, 2024. doi: 10.9744/jmk.26.1.64-78.

F. N. Styaningsih and Z. Abidin, “The Influence of Personal Selling and Service Quality on Jenius Application User Satisfaction and Loyalty Using the E-Servqual Model,” Journal of Advances in Information Systems and Technology, vol. 7, no. 1, 2025. doi: 10.15294/jaist.v7i1.13259.

R. Alawaji and A. Aloraini, “Sentiment Analysis of Digital Banking Reviews Using Machine Learning and Large Language Models,” Electronics, vol. 14, no. 11, 2025. doi: 10.3390/electronics14112125.

A. A. Mulyadi, S. H. Wijoyo and H. M. Az-Zahra, “Analisis Pengaruh Kualitas Layanan Terhadap Kepuasan Pelanggan dan Loyalitas Pengguna Aplikasi Jenius Menggunakan Model E-S- Qual dan E- Recs- QUal (Studi Kasus: Pengguna Aplikasi Jenius Kota Malang),” Jurnal Teknologi Informasi Dan Ilmu Komputer, vol. 9, no. 6, pp. 1145-1154, 2022. doi: 10.25126/jtiik.2022934937.

Jimmy and V. R. Prasetyo, “Sentiment analysis on feedback of higher education teaching conduct: An empirical evaluation of methods,” in Proceedings The 3rd International Conference on Informatics, Technology and Engineering, 2022. doi: 10.1063/5.0080182.

M. IŞIK and H. DAĞ, “The impact of text preprocessing on the prediction of review ratings,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 28, p. 1405 – 1421, 2020. doi: 10.3906/elk-1907-46.

U. Khairani, V. Mutiawani and H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model IndoBERT dan IndoBERTTweet Untuk Mendeteksi Emosi pada Komentar Akun Berita Instagram,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 4, pp. 887-894, 2024. doi: 10.25126/jtiik.1148315.

M. Siino, I. Tinnirello and M. L. Cascia, “Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifier,” Information Systems, vol. 121, 2024. doi: 10.1016/j.is.2023.102342.

R. Egger and J. Yu, “A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts,” Frontiers in Sociology, vol. 7, no. 886498, 2022. doi: 10.3389/fsoc.2022.886498.

T. Ali, B. Omar and K. Soulaimane, “Analyzing tourism reviews using an LDA topic-based sentiment analysis approach,” MethodsX, vol. 9, 2022. doi: 10.1016/j.mex.2022.101894.

N. M. K. Sedana, I. N. S. Wijaya and I. K. R. Artana, “Analisis Sentimen Berbahasa Inggris Dengan Metode LSTM Studi Kasus Berita Online Pariwisata Bali,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 6, pp. 1325-1334, 2024. doi: 10.25126/jtiik.2024118792.

K. Chen and G. Wei, “Public sentiment analysis on urban regeneration: A massive data study based on sentiment knowledge enhanced pre-training and latent Dirichlet allocation,” PLoS ONE, vol. 18, no. 4, 2023. doi: 10.1371/journal.pone.0285175.

D. Voskergian, R. Jayousi and M. Yousef, “Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach,” Scientific Reports, vol. 14, no. 23516, 2024. doi: 10.1038/s41598-024-74022-2.

L. Liu and B. Ma, “CA-VAR-Markov model of user needs prediction based on user generated content,” Scientific Reports, vol. 15, no 7716, 2025. doi: 10.1038/s41598-025-92173-8.

Q. Zhou, D. Yang, S .Zheng and S. Cheng, “Research on Sentiment Analysis Techniques for Online Ideological and Political Education,” Applied Mathematics and Nonlinear Sciences, vol. 9, no. 1, pp. 1-19, 2024. doi: 10.2478/amns.2023.2.00338.

N. Jacob and V. M. Viswanatham, “Sentiment Analysis Using Improved Atom Search Optimizer With a Simulated Annealing and ReLU Based Gated Recurrent Unit,” IEEE Access, vol. 12, pp. 38944-38956, 2024. doi: 10.1109/ACCESS.2024.3375119.

Z. A. Guven, B. Diri and T. Cakaloglu, “Impact of N-Stage Latent Dirichlet Allocationon Analysis of Headline Classification,” Computer Science, vol. 23, no. 3, pp. 375-394, 2022. doi: 10.7494/csci.2022.23.3.4622.

Y. Su and Z. J. Kabala, “Public Perception of ChatGPT and Transfer Learning for Tweets Sentiment Analysis Using Wolfram Mathematica,” Data, vol. 8, no. 180, 2023. doi: 10.3390/data8120180.

S. Yue, “Research on Microblog Comment Clustering Algorithm Based on Emotional Topic Feature Word Weighting,” Proceedings IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT), 2024. doi: 10.1109/ICCECT60629.2024.10545884.

N. N. Hidayati, “Improving Aspect-Based Sentiment Analysis for Hotel Reviews with Latent Dirichlet Allocation and Machine Learning Algorithms,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol .9, no. 2, pp. 144-159, 2023. doi: 10.26594/register.v9n2.3441.

S. Song and A. P. Johnson, “Predicting Drug Review Polarity Using the Combination Model of Multi-Sense Word Embedding and Fuzzy Latent Dirichlet Allocation (FLDA),” IEEE Access, vol. 11, pp. 118538-118546, 2023. doi: 10.1109/ACCESS.2023.3326757.

S. R. Kothuri and N. R. RajaLakshmi, “MALO-LSTM: Multimodal Sentiment Analysis Using Modified Ant Lion Optimization with Long Short Term Memory Network,” International Journal of Intelligent Engineering and Systems, vol. 15, no. 5, 2022. doi: 10.22266/ijies2022.1031.29.

A. Pradhan, M. R. Senapati and P. K. Sahu, “Improving sentiment analysis with learning concepts from concept, patterns lexicons and negations,” Ain Shams Engineering Journal, vol. 13, 2022. doi: 10.1016/j.asej.2021.08.004.

N. Aslam, K. Xia, F. Rustam, A. Hameed and I. Ashraf, “Using Aspect-Level Sentiments for Calling App Recommendation with Hybrid Deep-Learning Models,” Applied Sciences, vol. 12, no. 17, 2022. doi: 10.3390/app12178522.

N. M. N. Mathivanan, R. M. Janor, S. A. Razak, and N. A. M Ghani, “Feature Substitution Using Latent Dirichlet Allocation for Text Classification,” International Journal of Advanced Computer Science and Applications, vol. 16, no. 1, pp. 1087-1098, 2025. doi: 10.14569/IJACSA.2025.01601105.

S. Yuan, “Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools,” Academic Journal of Computing & Information Science, vol. 6, no. 9, pp. 25-31, 2023. doi: 10.25236/AJCIS.2023.060904.

E. D. Madyatmadja, H. Candra, J. Nathaniel, M. R. Jonathan and Rudy, “Sentiment Analysis on User Reviews of Threads Applications in Indonesia,” Journal Européen des Systèmes Automatisés, vol. 57, no. 4, pp. 1165-1171, 2024. doi: 10.18280/jesa.570423.

V. R. Prasetyo, M. F. Naufal and K. Wijaya, “Sentiment Analysis of ChatGPT on Indonesian Text using Hybrid CNN and Bi-LSTM,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 9, no. 2, pp. 327-333, 2025. doi: 10.29207/resti.v9i2.6334.

V. R. Prasetyo and A. H. Samudra, “Hate Speech Content Detection System on Twitter using K-Nearest Neighbor Method,” in Proceedings The 3rd International Conference on Informatics, Technology and Engineering, 2022. doi: 10.1063/5.0080185.

M. I. Raif, N. N. Hidayati and T. Matulatan, “Otomatisasi Pendeteksi Kata Baku Dan Tidak Baku Pada Data Twitter Berbasis KBBI,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 2, pp. 337-348, 2024. doi: 10.25126/jtiik.20241127404.

M. O. Ibrohim and I. Budi, “A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media,” Proceedings in 3rd International Conference on Computer Science and Computational Intelligence, 2018. doi: 10.1016/j.procs.2018.08.169.

R. Yasutomi, S. Yamada and T. Onoda, “Examination of Document Clustering Based on Independent Topic Analysis and Word Embeddings,” in Proceedings 17th International Conference on Agents and Artificial Intelligence, 2025. doi: 10.5220/0013104100003890

M. Zhang, L. Sun, Y. Li, G. A. Wang and H. Zhen, “Using supplementary reviews to improve customer requirement identification and product design development,” Journal of Management Science and Engineering, vol. 8, no. 4, pp. 584-597, 2023. doi: 10.1016/j.jmse.2023.03.001.

A. Vitetta, “Sentiment Analysis Models with Bayesian Approach: A Bike Preference Application in Metropolitan Cities,” Journal of Advanced Transportation, vol. 2022, no. 1, 2022. doi: 10.1155/2022/2499282.

J. P. Arisula and P. Parjito, “Comparison Of Naive Bayes And Random Forest Methods In Sentiment Analysis On The Getcontact Application”, Jurnal Teknik Informatika (JUTIF), vol. 5, no. 5, pp. 1221-1230, 2024. doi: 10.52436/1.jutif.2024.5.5.2004.

M. Yang, “English Sentiment Analysis And Its Application In Translation Based On Decision Tree Algorithm,” International Journal of Maritime Engineering, vol. 1, no. 1, pp. 395-407, 2024. doi: 10.5750/ijme.v1i1.1371.

R. K. Ramasamy, M. Muniandy and P. Subramanian, “A Predictive Framework for Sustainable Human Resource Management Using tNPS-Driven Machine Learning Models,” Sustainability, vol. 17, no. 13, 2025. doi: 10.3390/su17135882.

O. I. Villanueva, K. E. Linares, R. O. F. Castañeda and M. C. Carbonell, “Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes,” Diagnostics, vol. 13, no. 14, 2023. doi: 10.3390/diagnostics13142383.

M. Aslam, D. Ye, A. Tariq, M. Asad, M. Hanif, D. Ndzi, S. A. Chelloug, M. A. Elaziz, M. A. A. Al-Qaness and S. F. Jilani, “Adaptive Machine Learning Based Distributed Denial-of-Services Attacks Detection and Mitigation System for SDN-Enabled IoT,” Sensors, vol. 22, no. 7, 2022. doi: 10.3390/s22072697.

Y. J. Chang, Y. L. Lin and P. F. Pai, “Support Vector Machines with Hyperparameter Optimization Frameworks for Classifying Mobile Phone Prices in Multi-Class,” Electronics, vol. 14, no. 11, 2025. doi: 10.3390/electronics14112173.