TEXT CLUSTERING IN KARO LANGUAGE USING TF-IDF WEIGHTING AND K-MEANS CLUSTERING

  • Trisna Amanda Br Sembiring Program Studi Ilmu Komputer, Universitas Islam Negeri Sumatera Utara, Indonesia
  • Muhammad Siddik Hasibuan Program Studi Ilmu Komputer, Universitas Islam Negeri Sumatera Utara, Indonesia
Keywords: Clustering, Karo Language, K-means, RapidMiner, Tf-Idf

Abstract

The aim of this research is to see how many presentations there are between dialects and look for clusters. There is also a method used for weighting, namely tf-idf, there are several steps used in this method, namely starting from the tokenizing process, transform cases, stopwords filter and token filter. to search for clusters using the k-means clustering method on rapidminer. The results of this research obtained a tf-idf weighting value, namely ginger dialect 37.5% for the number of word occurrences and 62.5% for the total of all words documented. Furthermore, for the Julu dialect, it was 37.5% for the number of word occurrences and 62.5% for the total of all words documented. The Singaporean Lau dialect accounts for 38% of the number of word occurrences and 62% of the total number of words documented. The singteruh deleng lau dialect accounts for 38% of the number of word occurrences and 62% of the total number of words documented. The Liang Melas dialect accounts for 38% of the number of word occurrences and 62% of the total number of words documented. Based on k-means clustering, it produces cluster 0: 68 items, cluster 1: 3 items, cluster 2: 15 items, cluster 3: 10 items, cluster 4: 4 items with a total sample of 100 items. The conclusion obtained is that the Ginger dialect and the Julu dialect are identical, while the Singaporean Lau dialect, the Teruh Deleng and Liang Melas dialects are also identical.

Downloads

Download data is not yet available.

References

R. A. Kurniawan, M. S. Hasibuan, P. Piramida, And R. S. Ramadhan, “Penerapan Algoritma K-Means Untuk Clustering Tempat Makan Di Batubara,” J. Comput. Sci. Informatics Eng., Vol. 01, No. 1, Pp. 10–18, 2022, Doi: 10.55537/Cosie.V1i1.27.

A. I. Abdullah, E. Winarko, And A. Musdholifah, “Metode Boost-K-Means Untuk Clustering Puskesmas Berdasarkan Persentase Bayi Yang Diimunisasi,” Jrst (Jurnal Ris. Sains Dan Teknol., Vol. 4, No. 2, P. 89, 2020, Doi: 10.30595/Jrst.V4i2.7546.

A. P. Wibawa, H. K. Fithri, I. A. E. Zaeni, And A. Nafalski, “Generating Javanese Stopwords List Using K-Means Clustering Algorithm,” Knowl. Eng. Data Sci., Vol. 3, No. 2, P. 106, 2020, Doi: 10.17977/Um018v3i22020p106-111.

M. Purniawan Arta, G. Sasnita Arya, And P. Pratama Eka Agus, “Clustering Berita Menggunakan Algoritma Tf-Idf Dan K-Means Dengan Memanfaatkan Sumber Data Crawling Pada Situs Detik.Com.Jurnal Ilmiah Teknologi Dan Komputer. 2022 ”

F. Nuraeni, D. Tresnawati, Y. Handoko Agustin, And G. Fauzi, “Optimization Of Market Basket Analysis Using Centroid-Based Clustering Algorithm And Fp-Growth Algorithm,” J. Tek. Inform., Vol. 3, No. 6, Pp. 1581–1590, 2022, Doi: 10.20884/1.Jutif.2022.3.6.399.

J. Eska, M. Fitri Larasati, P. Studi Sistem Informasi, And S. Tinggi Manajemen Informatika Dan Komputer Royal Kisaran, “Application Of K-Means Clustering Method To Cluster Students’ English Skill Jason English Course,” J. Tek. Inform., Vol. 3, No. 3, Pp. 479–485, 2022, [Online]. Available: Https://Doi.Org/10.20884/1.Jutif.2022.3.3.167.

D. S. Saputri, G. M. Putra, And M. F. Larasati, “Implementation Of The K-Means Clustering Algorithm For The Covid-19 Vaccinated Village In The Ujung Padang Sub-District Implementasi Algoritma K-Means Clustering Untuk Desa Tervaksinasi Covid-19 Pada Kecamatan Ujung Padang,” Vol. 3, No. 2, Pp. 261–267, 2022 Jurnal Teknik Informatika.

R. Rosmini, A. Fadlil, And S. Sunardi, “Implementasi Metode K-Means Dalam Pemetaan Kelompok Mahasiswa Melalui Data Aktivitas Kuliah,” It J. Res. Dev., Vol. 3, No. 1, Pp. 22–31, 2018, Doi: 10.25299/Itjrd.2019.Vol3(1).1773.

S. Fitriani, “Implementasi Data Mining Dalam Pengelompokkan Minat Baca Pengunjung Pada Perpustakaan Stmik Triguna Dharma Medanmenggunakan Metode K-Means,” J. Cybertech, 2020.

A. Iriansyah And M. F. Y. Gafallo, “Budaya Partisipasi Dan Resistensi Komunitas Keagamaan Di Media Sosial Participatory Culture And Resistance Of Religious Communities Kementerian Agama Republik Indonesia Merilis Tayangan Live Telekonferensi Sidang Isbat Pada,” Pp. 17–30, 2022, Doi: 10.17933/J. Kominfo 2022.4780.

P. A. E. P. Made Arta Purniawan, Gusti Made Arya Sasmita, “Clustering Berita Menggunakan Algoritma Tf-Idf Dan K-Means Dengan Memanfaatkan Sumber Data Crawling Pada Situs Detik.Com,” J. Ilm. Teknol. Dan Komput. Vol., Vol. 3, No. 1, 2022.

E. Ikhsan, “Penerapan K-Means Clustering Dari Log Data Moodle Untuk Menentukan Perilaku Peserta Pada Pembelajaran Daring,” J.sistemasi, Vol. 10, No. 2, P. 414, 2021, Doi: 10.32520/Stmsi.V10i2.1285.

M. Darwis, G. T. Pranoto, And Y. E. Wicaksana, “Implementation Of Tf-Idf Algorithm And K-Mean Clustering Method To Predict Words Or Topics On Twitter,” Vol. 03, No. 02, Pp. 49–55, Jurnal Informatika dan Sains. 2020.

D. Prinst, Kamus Karo Indonesia, 4th Ed. Medan: Bina Media Printis, 2014.

A. Siregar Samin, P. Sukapiring, S. Tarigan, M. Sembiring Cikappen, And Zulkifly, Eds., Kamus Bahasa Karo-Indonesia. Jakarta: Balai Pustaka, 2001.

M. A. Rofiqi, A. C. Fauzan, A. P. Agustin, And A. A. Saputra, “Implementasi Term-Frequency Inverse Document Frequency (Tf-Idf) Untuk Mencari Relevansi Dokumen Berdasarkan Query,” Ilk. J. Comput. Sci. Appl. Informatics, Vol. 1, No. 2, Pp. 58–64, 2019, Doi: 10.28926/Ilkomnika.V1i2.18.

Y. Muhammad Darwis , Gatot Tri Pranoto , Yusuf Eka Wicaksana, “Implementation Of Tf-Idf Algorithm And K-Mean Clustering Method To Predict Words Or Topics On Twitter,” Jisa (Jurnal Inform. Dan Sains, Vol. 03, Pp. 49–55, 2020.

R. T. Wahyuni, D. Prastiyanto, And E. Supraptono, “Penerapan Algoritma Cosine Similarity Dan Pembobotan Tf-Idf Pada Sistem Klasifikasi Dokumen Skripsi,” J. Tek. Elektro Univ. Negeri Semarang, Vol. 9, No. 1, Pp. 18–23, 2019, [Online]. Available: Https://Journal.Unnes.Ac.Id/Nju/Index.Php/Jte/Article/Download/10955/6659.

Published
2023-11-13
How to Cite
[1]
T. A. Br Sembiring and M. S. Hasibuan, “TEXT CLUSTERING IN KARO LANGUAGE USING TF-IDF WEIGHTING AND K-MEANS CLUSTERING”, J. Tek. Inform. (JUTIF), vol. 4, no. 5, pp. 1257-1265, Nov. 2023.