TOPIC MODELING IN COVID-19 VACCINATION REFUSAL CASES USING LATENT DIRICHLET ALLOCATION AND LATENT SEMANTIC ANALYSIS
COVID -19 vaccination is a program provided by the Indonesian government to minimize the spread of the virus. The COVID-19 vaccination program in Indonesia goes hand in hand with issues that are circulating, causing controversy and rejection of vaccination on social media, especially Twitter. There are many factors that influence vaccine rejection on Twitter, to summarize frequently discussed topics and find out hidden topics, this study uses the Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) methods from 1797 Twitter scrapping data. Both models require a set of words that have been converted into a matrix, so before conducting LDA topic modeling, the dataset will undergo a bag of word (BOW) calculation. Meanwhile, in LSA topic modeling, the existing dataset will undergo word weighting of frequently occurring words using Term Frequency - Inverse Document Frequency (TF-IDF). This study was conducted to find and summarize hidden information in the form of frequently discussed topics, thus understanding public opinions related to the COVID -19 vaccination refusal case. LDA and LSA methods will display topics based on the probability and mathematical calculations of word occurrences in each topic in the document. The topics that appear will be further analyzed through coherence score by applying a limit of 20 topics to display the best value. Further modeling experiments are carried out to display topics through LDA and LSA models, this study takes 6 topics with the highest coherence values including the right of individuals to choose whether to be vaccinated or not (0.484607), the Ribka Tjiptaning controversy (0.473368), rejection of the COVID-19 vaccine by groups represented by public figures (0.463631), punishment for non-compliance in the form of fines (0.324924), and halal certification (0.312521).
S. S. Aljameel et al., “A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent covid-19 outbreaks in Saudi Arabia,” Int. J. Environ. Res. Public Health, vol. 18, no. 1, pp. 1–12, 2021, doi: 10.3390/ijerph18010218.
P. A. Sumitro, Rasiban, D. I. Mulyana, and W. Saputro, “Analisis Sentimen Terhadap Vaksin Covid-19 di Indonesia pada Twitter Menggunakan Metode Lexicon Based,” J-ICOM - J. Inform. dan Teknol. Komput., vol. 2, no. 2, pp. 50–56, 2021, doi: 10.33059/j-icom.v2i2.4009.
Q. A. Chairunnisa, Y. Herdiyeni, M. K. D. Hardhienata, and J. Adisantoso, “Analisis Sentimen Pengguna Twitter Terhadap Program Vaksinasi Covid-19 di Indonesia Menggunakan Algoritme Support Vector Machine,” J. Ilmu Komput. dan Agri-Informatika, vol. 9, no. 1, pp. 79–89, 2022, doi: 10.29244/jika.9.1.79-89.
J. Xue, J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu, “Public discourse and sentiment during the COVID 19 pandemic: Using latent dirichlet allocation for topic modeling on twitter,” PLoS One, vol. 15, no. 9 September, pp. 1–12, 2020, doi: 10.1371/journal.pone.0239441.
F. F. Rachman and S. Pramana, “Analisis Sentimen Pro dan Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter,” Heal. Inf. Manag. J., vol. 8, no. 2, pp. 100–109, 2020, [Online]. Available: https://inohim.esaunggul.ac.id/index.php/INO/article/view/223/175
A. Muzaki and A. Witanti, “Sentiment Analysis of the Community in the Twitter To the 2020 Election in Pandemic Covid-19 By Method Naive Bayes Classifier,” J. Tek. Inform., vol. 2, no. 2, pp. 101–107, 2021, doi: 10.20884/1.jutif.2021.2.2.51.
S. Sarica and J. Luo, “Stopwords in technical language processing,” PLoS One, vol. 16, no. 8 August, pp. 1–13, 2021, doi: 10.1371/journal.pone.0254937.
A. Amalia, D. Gunawan, Y. Fithri, and I. Aulia, “Automated Bahasa Indonesia essay evaluation with latent semantic analysis,” J. Phys. Conf. Ser., vol. 1235, no. 1, 2019, doi: 10.1088/1742-6596/1235/1/012100.
B. O. Karo Karo, D. S. Naga, and V. C. Mawardi, “Perancangan Aplikasi Pendeteksi Kemiripan Teks Dengan Menggunakan Metode Latent Semantic Analysis,” Comput. J. Comput. Sci. Inf. Syst., vol. 4, no. 1, p. 1, 2020, doi: 10.24912/computatio.v4i1.7191.
H. J. Kang, C. Kim, and K. Kang, “Analysis of the trends in biochemical research using latent dirichlet allocation (LDA),” Processes, vol. 7, no. 6, pp. 1–14, 2019, doi: 10.3390/PR7060379.
L. W. Narendra, “Topic Modeling in Conversational Dialogs for Naming Intent Labels Using LDA,” JTECS J. Sist. Telekomun. Elektron. Sist. Kontrol Power Sist. dan Komput., vol. 2, no. 1, p. 65, 2022, doi: 10.32503/jtecs.v2i1.1820.
D. Ridhwanulah and D. H. Fudholi, “Pemodelan Topik pada Cuitan tentang Penyakit Tropis di Indonesia dengan Metode Latent Dirichlet Allocation,” J. Ilm. SINUS, vol. 20, no. 1, p. 11, 2022, doi: 10.30646/sinus.v20i1.589.
F. Alattar and K. Shaalan, “Emerging Research Topic Detection Using Filtered-LDA,” Ai, vol. 2, no. 4, pp. 578–599, 2021, doi: 10.3390/ai2040035.
A. H. Ardiansyah, K. P. Kartika, and S. N. Budiman, “Penerapan Latent Semantic Indexing Pada Sistem Temu Balik Informasi Pada Undang-Undang Pemilu Berdasarkan Kasus,” J. Mnemon., vol. 4, no. 2, pp. 64–70, 2021.
S. Qomariyah, N. Iriawan, and K. Fithriasari, “Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis,” AIP Conf. Proc., vol. 2194, no. December 2019, 2019, doi: 10.1063/1.5139825.
T. Williams and J. Betak, “A Comparison of LSA and LDA for the Analysis of Railroad Accident Text,” vol. 11, no. 1, pp. 11–15, 2019, doi: 10.5383/JUSPN.11.01.002.
Copyright (c) 2023 Ulfah Malihatin S
This work is licensed under a Creative Commons Attribution 4.0 International License.