TOPIC MODELING USING THE LATENT DIRICHLET ALLOCATION METHOD ON WIKIPEDIA PANDEMIC COVID-19 DATA IN INDONESIA
Abstract
Wikipedia is a web-based encyclopedia that is used to search for information. In one of the Wikipedia articles, a problem has been found regarding no one has clustered on the topic of the Covid-19 pandemic in Indonesia. The method used for this research is the Latent Dirichlet Allocation (LDA) method. The Latent Dirichlet Allocation (LDA) method is the most widely used topic modeling method today. In this study using 6658 words in English that will be used for the dataset. Then every word that appears will be counted using Corpus. This study applies topic modeling using the Latent Dirichlet Allocation (LDA) model and how to analyze COVID-19 data taken from Wikipedia. The LDA method will cluster by looking at the number of words that appear in Corpus and will determine the number of clusters and the number of topics and determine the iteration. The purpose of this study is to classify the information contained in the Wikipedia Article so that it can be used as an evaluation material in improving services and handling Wikipedia using the latent direchlet allocation method. The LDA method will mark every word contained in the topic in a semi-random distribution and will calculate the probability of the topic in the dataset and will calculate the probability of the word on the topic of each iteration. In this study, 5 iteration tests were conducted on topic modeling and a number of different topics. After the experiment is carried out, the final results obtained will be analyzed and get 1 number of topics with the best results with the most discussion topics regarding health.
Downloads
References
A. Susilo et al., “Coronavirus Disease 2019: Tinjauan Literatur Terkini,” J. Penyakit Dalam Indones., vol. 7, no. 1, p. 45, 2020, doi: 10.7454/jpdi.v7i1.415.
B. Wijayanto, Y. I. Kurniawan, T. Cahyono, and I. P. Jati, “Information System for Monitoring Community Participant Program Services in the Covid-19 Pandemic Era,” J. Tek. Inform., vol. 3, no. 1, pp. 37–44, 2022.
N. Aeni, “Pandemi COVID-19: Dampak Kesehatan, Ekonomi, & Sosial,” J. Litbang Media Inf. Penelitian, Pengemb. dan IPTEK, vol. 17, no. 1, pp. 17–34, 2021, doi: 10.33658/jl.v17i1.249.
Ardoni, “Evaluasi Sumber Informasi Digital: Wikipedia,” Shaut Al-Maktabah J. Perpustakaan, Arsip dan Dokumentasi, vol. 12, no. 1, pp. 1–10, 2020, doi: 10.37108/shaut.v12i1.302.
Fitri, K. R. R, A. Rahmansyah, and W. Darwin, “Penggunaan Bahasa Pemrograman Python Sebagai Pusat Kendali Pada Robot 10-D,” 5th Indones. Symp. Robot. Syst. Control, pp. 23–26, 2017.
Y. Guo, S. Han, Y. Li, C. Zhang, and Y. Bai, “K-Nearest Neighbor combined with guided filter for hyperspectral image classification,” Procedia Comput. Sci., vol. 129, pp. 159–165, 2018, doi: 10.1016/j.procs.2018.03.066.
M. L. C. Chilmi, “Latent dirichlet allocation lda untuk mengetahui topik pembicaraan warganet twitter tentang omnibus law,” Repository.Uinjkt.Ac.Id, 2021, [Online]. Available: https://repository.uinjkt.ac.id/dspace/handle/123456789/56724%0Ahttps://repository.uinjkt.ac.id/dspace/bitstream/123456789/56724/1/M. LUVIAN CHISNI CHILMI-FST.pdf
F. Rashif, G. Ihza Perwira Nirvana, M. Alif Noor, and N. Aini Rakhmawati, “Implementasi LDA untuk Pengelompokan Topik Cuitan Akun Bot Twitter bertagar #Covid-19 LDA Implementation for Topic of Bot’s Tweets with #Covid-19 Hashtag,” Cogito Smart J. |, vol. 7, no. 1, pp. 170–181, 2021.
C. Naury, D. H. Fudholi, and A. F. Hidayatullah, “Topic Modelling pada Sentimen Terhadap Headline Berita Online Berbahasa Indonesia Menggunakan LDA dan LSTM,” J. Media Inform. Budidarma, vol. 5, no. 1, p. 24, 2021, doi: 10.30865/mib.v5i1.2556.
Y. I. Kurniawan, E. Soviana, and I. Yuliana, “Merging Pearson Correlation and TAN-ELR algorithm in recommender system,” AIP Conf. Proc., vol. 1977, no. June 2018, 2018, doi: 10.1063/1.5042998.
Sugiono, S. Nurdiani, S. Linawati, R. A. Safitri, and E. P. Saputra, “Pengelompokan Perilaku Mahasiswa Pada Perkuliahan E-Learning dengan K-Means Clustering,” J. Kaji. Ilm., vol. 19, no. 2, pp. 126–133, 2019.
W. Dhuhita, “Clustering Menggunakan Metode K-Mean Untuk Menentukan Status Gizi Balita,” J. Inform. Darmajaya, vol. 15, no. 2, pp. 160–174, 2015.
G. Gustientiedina, M. H. Adiya, and Y. Desnelita, “Penerapan Algoritma K-Means Untuk Clustering Data Obat-Obatan,” J. Nas. Teknol. dan Sist. Inf., vol. 5, no. 1, pp. 17–24, 2019, doi: 10.25077/teknosi.v5i1.2019.17-24.
O. Menggunakan Metode, “Sistem Monitoring Percakapan Pada Toko,” 2018.
P. Studi, T. Informatika, F. Sains, D. A. N. Teknologi, U. Islam, and N. Syarif, “Penerapan Tokenisasi Kalimat Dan Metode Tf ( Term Frequency ) Pada Peringkas Teks Otomatis Penerapan Tokenisasi Kalimat Dan Metode Tf ( Term Frequency ) Pada Peringkas Teks Otomatis,” 2014.
Zulhanif, Sudartianto, B. Tantular, and I. G. N. M. Jaya, “Aplikasi Latent Dirichlet Allocation ( Lda ) Pada Clustering Data Teks,” J. Log., vol. 7, no. 1, pp. 46–51, 2017.
P. S. Statistika, F. Matematika, D. A. N. Ilmu, P. Alam, and U. I. Indonesia, “Latent Dirichlet Allocation Untuk Pemodelan Tugas Akhir Latent Dirichlet Allocation Untuk Pemodelan,” 2020.
B. W. Arianto and G. Anuraga, “Topic Modeling for Twitter Users Regarding the ‘Ruanggguru’ Application,” J. ILMU DASAR, vol. 21, no. 2, p. 149, 2020, doi: 10.19184/jid.v21i2.17112.
Copyright (c) 2022 Wilujeng Ayu Nawang Sari
This work is licensed under a Creative Commons Attribution 4.0 International License.