LEARNING RATE AND EPOCH OPTIMIZATION IN THE FINE-TUNING PROCESS FOR INDOBERT’S PERFORMANCE ON SENTIMENT ANALYSIS OF MYTELKOMSEL APP REVIEWS
Abstract
With the advancement of the digital era, the growth of mobile applications in Indonesia is rapidly increasing, particularly with the MyTelkomsel app, one of the leading applications with over 100 million downloads. Given the large number of downloads, user reviews become crucial for improving the quality of services and products. This study proposes a sentiment analysis approach utilizing the Indonesian language model, IndoBERT. The main focus is on optimizing the learning rate and epochs during the fine-tuning process to enhance the performance of sentiment analysis on MyTelkomsel app reviews. The IndoBERT model, trained with the Indo4B dataset, is the ideal choice due to its proven capabilities in Indonesian text classification tasks. The BERT architecture provides contextual and extensive word vector representations, opening opportunities for more accurate sentiment analysis. This study emphasizes the implementation of fine-tuning with the goal of improving the model's accuracy and efficiency. The test results show that the model achieves a high accuracy of 96% with hyperparameters of batch size 16, learning rate 1e-6, and 3 epochs. The optimization of the learning rate and epoch values is key to refining the model. These results provide in-depth insights into user sentiment towards the MyTelkomsel app and practical guidance on using the IndoBERT model for sentiment analysis on Indonesian language reviews.
Downloads
References
T. Sutarsih and K. Maharani, STATISTIK TELEKOMUNIKASI INDONESIA 2022. Indonesia: Badan Pusat Statistik, 2023. Accessed: Nov. 15, 2023. [Online]. Available: https://www.bps.go.id/publication/2023/08/31/131385d0253c6aae7c7a59fa/statistik-telekomunikasi-indonesia-2022.html
O. : Asep and I. Nugraha, “FAKTOR-FAKTOR YANG MEMPENGARUHI PENGGUNAAN SMARTPHONE DALAM AKTIVITAS BELAJAR MAHASISWA TEKNOLOGI PENDIDIKAN UNIVERSITAS NEGERI YOGYAKARTA FACTORS AFFECTING USE OF SMARTPHONE IN STUDENTS LEARNING ACTIVITIES.”
A. Morgan-Thomas, L. Dessart, and C. Veloutsou, “DIGITAL ECOSYSTEM AND CONSUMER ENGAGEMENT: A SOCIO-TECHNICAL PERSPECTIVE Author Details DIGITAL ECOSYSTEM AND CONSUMER ENGAGEMENT: A SOCIO-TECHNICAL PERSPECTIVE DIGITAL ECOSYSTEM AND CONSUMER ENGAGEMENT: A SOCIO-TECHNICAL PERSPECTIVE.”
A. Ligthart, C. Catal, and B. Tekinerdogan, “Systematic reviews in sentiment analysis: a tertiary study,” Artif Intell Rev, vol. 54, no. 7, pp. 4997–5053, Oct. 2021, doi: 10.1007/s10462-021-09973-3.
H. M. Keerthi Kumar, B. S. Harish, and H. K. Darshan, “Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, pp. 109–114, 2019, doi: 10.9781/ijimai.2018.12.005.
S. Pandya and P. Mehta, “A Review On Sentiment Analysis Methodologies, Practices And Applications,” INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, vol. 9, p. 2, 2020, [Online]. Available: www.ijstr.org
M. Darwich, S. A. Mohd Noah, N. Omar, and N. A. Osman, “Corpus-Based Techniques for Sentiment Lexicon Generation: A Review,” Journal of Digital Information Management, vol. 17, no. 5, p. 296, Oct. 2019, doi: 10.6025/jdim/2019/17/5/296-305.
J. Tugas, A. Fakultas, H. K. Putra, M. Arif Bijaksana, and A. Romadhony, “Deteksi Penggunaan Kalimat Abusive Pada Teks Bahasa Indonesia Menggunakan Metode IndoBERT.”
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.00677
E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, Dec. 2021, doi: 10.3390/informatics8040079.
T. Yu and H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications,” Mar. 2020, [Online]. Available: http://arxiv.org/abs/2003.05689
S. Mohammadi and M. Chapon, “Investigating the performance of fine-tuned text classification models based-on BERT,” in 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2020, pp. 1252–1257.
M. Wortsman et al., “Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time.” [Online]. Available: https://github.
B. Ferdinandy et al., “Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures,” PLoS One, vol. 15, no. 7, Jul. 2020, doi: 10.1371/journal.pone.0236092.
D. Sebastian, H. D. Purnomo, and I. Sembiring, “Bert for natural language processing in bahasa Indonesia,” in 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), 2022, pp. 204–209.
M. Hosseinzadeh et al., “Data cleansing mechanisms and approaches for big data analytics: a systematic study,” J Ambient Intell Humaniz Comput, vol. 14, no. 1, pp. 99–111, Jan. 2023, doi: 10.1007/s12652-021-03590-2.
D. Miller, “Leveraging BERT for Extractive Text Summarization on Lectures.” [Online]. Available: https://github.com/dmmiller612/lecture-summarizer.
C. Chantrapornchai and A. Tunsakul, “Information Extraction based on Named Entity for Tourism Corpus,” Jan. 2020, doi: 10.1109/JCSSE.2019.8864166.
F. Millstein, Natural language processing with python: natural language processing using NLTK. 2020.
B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Sep. 2020, [Online]. Available: http://arxiv.org/abs/2009.05387
A. Nayak, H. P. Timmapathini, K. Ponnalagu, and V. Venkoparao, Domain adaptation challenges of BERT in tokenization and sub-word representations of Out-of-Vocabulary words. Association for Computational Linguistics, 2020. [Online]. Available: https://github.com/
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.04805
I. G. T. Isa and B. Junedi, “Hyperparameter Tuning Epoch dalam Meningkatkan Akurasi Data Latih dan Data Validasi pada Citra Pengendara,” Prosiding Sains Nasional dan Teknologi, vol. 12, no. 1, p. 231, Nov. 2022, doi: 10.36499/psnst.v12i1.6697.
H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and T. Zhao, “SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization,” Nov. 2019, doi: 10.18653/v1/2020.acl-main.197.
A. Zevana and D. Riana, “TEXT CLASSIFICATION USING INDOBERT FINE-TUNING MODELING WITH CONVOLUTIONAL NEURAL NETWORK AND BI-LSTM,” Jurnal Teknik Informatika (Jutif), vol. 4, no. 6, pp. 1605–1610, Jan. 2024, doi: 10.52436/1.jutif.2023.4.6.1650.
K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken Dw, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Sep. 2021, pp. 258–264. doi: 10.1145/3479645.3479679.
Serly Setyani, “Multi Aspect Sentiment Analysis of Mutual Funds Investment App Bibit Using BERT Method,” International Journal on Information and Communication Technology (IJoICT), vol. 9, no. 1, pp. 44–56, Jul. 2023, doi: 10.21108/ijoict.v9i1.718.
Copyright (c) 2024 Muhammad Naufal Zaidan, Yuliant Sibaroni, Sri Suryani Prasetyowati
This work is licensed under a Creative Commons Attribution 4.0 International License.