Implementation of Enhanced Confix Stripping Stemming and Chi-Squared Feature Selection on Classification UIN Walisongo Website with Naïve Bayes Classifier

Authors

  • Muhammad Naufal Muhadzib Al-Faruq Information Technology, Science and Technology, Walisongo State Islamic University, Indonesia
  • Wenty Dwi Yuniarti Information Technology, Science and Technology, Walisongo State Islamic University, Indonesia
  • Maya Rini Handayani Information Technology, Science and Technology, Walisongo State Islamic University, Indonesia
  • Khotibul Umam Information Technology, Science and Technology, Walisongo State Islamic University, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.3.4670

Keywords:

Academic News, Chi-squared, Classification, Enhanced Confix Stripping Stemmer, Naive Bayes Classifier

Abstract

Academic news classification on university websites remains a challenge due to the growing volume of content and lack of efficient categorization systems. At UIN Walisongo Semarang, this problem hinders students, faculty, and the public from easily accessing relevant information. This study aims to develop an automated academic news classification system to address this issue. We applied a Naïve Bayes Classifier model, enhanced with Term Frequency weighting, the Enhanced Confix Stripping Stemmer for Indonesian language preprocessing, and Chi-Squared feature selection to identify the most informative terms. The dataset consisted of 880 academic news articles from UIN Walisongo’s website, split into 704 training and 176 testing documents. The system achieved 95% accuracy on the test set. To evaluate generalizability, we used a separate evaluation set of 12 new articles, obtaining 83.3% accuracy. The preprocessing stage played a vital role in reducing morphological complexity, while Chi-Squared scoring improved the relevance of selected features. This research highlights the importance of robust text classification techniques in academic information systems, particularly in Indonesian language contexts where language morphology poses unique challenges. The proposed model demonstrates strong performance, scalability, and potential for integration into academic portals to improve information retrieval. This study contributes significantly to the field of Natural Language Processing and applied machine learning in academic settings, especially for Indonesian-language content. It provides an effective solution for automated academic content management in institutional information systems.

Downloads

Download data is not yet available.

References

M. D. Rizkiyanto, M. D. Purbolaksono, and W. Astuti, “Sentiment Analysis Classification on PLN Mobile Application Reviews Using Random Forest Method and TF-IDF Feature Extraction,” INTEK : Jurnal Penelitian, vol. 11, no. 1, pp. 37–43, Apr. 2024, doi: https://doi.org/10.31963/intek.v11i1.4774.

W. Purba, W. Siawin, M. N. K. Nababan, N. P. Dharshinni, and S. Aisyah, “Implementasi Data Mining untuk Pengelompokkan dan Prediksi Karyawan yang Berpotensi PHK dengan Algoritma K-Means Clustering,” JUSIKOM PRIMA : Jurnal Sistem Informasi Dan Ilmu Komputer Prima, vol. 2, no. 2, pp. 85–90, 2019, doi: https://doi.org/10.34012/jusikom.v2i2.429.

R. Qubra and R. A. Saputra, “Classification of Hoax News Using the Naïve Bayes Method,” IJSECS : International Journal Software Engineering and Computer Science, vol. 4, no. 1, pp. 40–48, Apr. 2024, doi: https://doi.org/10.35870/ijsecs.v4i1.2068.

N. Buslim, L. K. Oh, M. H. Athallah Hardy, and Y. Wijaya, “Comparative Analysis of KNN, Naïve Bayes and SVM Algorithms for Movie Genres Classification Based on Synopsis,” JTI : Jurnal Teknik Informatika, vol. 15, no. 2, pp. 169–177, Dec. 2022, doi: https://doi.org/10.15408/jti.v15i2.29302.

K. Wabang, Oky Dwi Nurhayati, and Farikhin, “Application of the Naïve Bayes Classifier Algorithm to Classify Community Complaints,” Jurnal RESTI : Rekayasa Sistem dan Teknologi Informasi, vol. 6, no. 5, pp. 872–876, Nov. 2022, doi: https://doi.org/10.29207/resti.v6i5.4498.

S. Widaningsih, “Perbandingan Metode Data Mining untuk Prediksi Nilai dan Waktu Kelulusan Mahasiswa Prodi Teknik Informatika dengan Algoritma C4.5, Naïve Bayes, KNN dan SVM,” Jurnal Tekno Insentif, vol. 13, no. 1, pp. 16–25, 2019, doi: https://doi.org/10.36787/jti.v13i1.78.

N. Nurdin, M. Suhendri, Y. Afrilia, and R. Rizal, “Klasifikasi Karya Ilmiah (Tugas Akhir) Mahasiswa Menggunakan Metode Naïve Bayes Classifier (NBC),” SISTEMASI : Jurnal Sistem Informasi, vol. 10, no. 2, pp. 268–279, 2021, doi: https://doi.org/10.32520/stmsi.v10i2.1193.

A. M. Billah, D. A. R. Wulandari, and Y. A. Auliya, “Rancang Bangun Chatbot Pengaduan Kekerasan Perempuan Anak dengan Metode Fuzzy String Matching dan Enhanced Confix Stripping Stemmer,” INFROMAL : Informatics Journal, vol. 8, no. 2, pp. 101–109, 2023, doi: https://doi.org/10.19184/isj.v8i2.42310.

I. Sholekha, A. Faqih, and A. Bahtiar, “Sentiment Analysis of Public Opinion Covid-19 Vaccine Using Naïve Bayes and Random Forest Methods,” JTI : Jurnal Teknik Informatika, vol. 15, no. 1, pp. 34–43, Jun. 2022, doi: https://doi.org/10.15408/jti.v15i1.24847.

N. Agustina, E. Sutinah, and M. Martini, “Kolaborasi Metode Naïve Bayes dan MPE dalam Pengambilan Keputusan Pemilihan Supplier Ban Motor,” Jurnal Media Informatika Budidarma, vol. 8, no. 2, pp. 1097–1108, 2024, doi: https://doi.org/10.30865/mib.v8i2.7538.

Y. D. Pramudita, S. S. Putro, and N. Makhmud, “Klasifikasi Berita Olahraga Menggunakan Metode Naïve Bayes dengan Enhanced Confix Stripping Stemmer,” JTIK : Jurnal Teknologi Informasi Dan Ilmu Komputer, vol. 5, no. 3, pp. 269–276, 2018, doi: https://doi.org/10.25126/jtiik.201853810.

R. R. Sani, Y. A. Pratiwi, S. Winarno, E. D. Udayanti, and F. Alzami, “Analisis Perbandingan Algoritma Naïve Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia,” JMASIF : Jurnal Masyarakat Informatika, vol. 13, no. 2, pp. 85–98, 2022, doi: https://doi.org/10.14710/jmasif.13.2.47983.

E. Y. Hidayat and M. A. Rizqi, “Klasifikasi Dokumen Berita Menggunakan Algoritma Enhanced Confix Stripping Stemmer dan Naïve Bayes Classifier,” Jurnal TEKNOSI : Nasional Teknologi dan Sistem Informasi, vol. 6, no. 2, pp. 90–99, 2020, doi: https://doi.org/10.25077/TEKNOSI.v6i2.2020.90-99.

B. S. Prakoso, D. Rosiyadi, H. S. Utama, and D. Aridarma, “Klasifikasi Berita Menggunakan Algoritma Naïve Bayes Classifier dengan Seleksi Fitur dan Boosting,” Jurnal RESTI : Rekayasa Sistem dan Teknologi Informasi, vol. 3, no. 2, pp. 227–232, 2019, doi: https://doi.org/10.29207/resti.v3i2.1042.

N. Asmiati, “Penerapan Algoritma Naïve Bayes untuk Mengklasifikasi Pengaruh Negatif Game Online bagi Remaja Milenial,” JTIM : Jurnal Teknologi Informasi Dan Multimedia, vol. 2, no. 3, pp. 141–149, 2020, doi: http://dx.doi.org/10.35746/jtim.v2i3.102.

E. Lindrawati, E. Utami, and A. Yaqin, “Comparison of Modified Nazief & Adriani and Modified Enhanced Confix Stripping Algorithms for Madurese Language Stemming,” INTENSIF : Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 7, no. 2, pp. 276–289, Aug. 2023, doi: https://doi.org/10.29407/intensif.v7i2.20103.

J. M. Luna-Romera, M. Martínez-Ballesteros, J. García-Gutiérrez, and J. C. Riquelme, “External Clustering Validity Index Based on Chi-Squared Statistical Test,” Information Sciences, vol. 487, pp. 1–17, 2019, doi: https://doi.org/10.1016/j.ins.2019.02.046.

D. Chen Sami, A. Sugiharto, and F. Jie, “Chi-Square Feature Selection for Improving Sentiment Analysis of News Data Privacy Threats,” JATIT : Journal of Theoretical and Applied Information Technology, vol. 102, no. 18, 2024, [Online]. Available: www.jatit.org

N. W. Wardani and P. G. S. C. Nugraha, “Stemming Teks Bahasa Bali dengan Algoritma Enhanced Confix Stripping,” IJNSE : International Journal of Natural Science and Engineering, vol. 4, no. 3, pp. 103–113, Dec. 2020, doi: https://doi.org/10.23887/ijnse.v4i3.30309.

N. Yusliani, S. A. Q. Aruda, M. D. Marieska, D. M. Saputra, and A. Abdiansah, “The Effect of Chi-Square Feature Selection on Question Classification Using Multinomial Naïve Bayes,” Sinkron : Jurnal dan Penelitian Teknik Informatika, vol. 7, no. 4, pp. 2430–2436, Oct. 2022, doi: https://doi.org/10.33395/sinkron.v7i4.11788.

A. Falasari and M. A. Muslim, “Optimize Naïve Bayes Classifier Using Chi-Square and Term Frequency Inverse Document Frequency for Amazon Review Sentiment Analysis,” JOSCEX : Journal of Soft Computing Exploration, vol. 3, no. 1, pp. 31–36, Mar. 2022, doi: https://doi.org/10.52465/joscex.v3i1.68.

D. A. Wulandari, F. A. Bachtiar, and I. Indriati, “Aspect-Based Sentiment Analysis on Shopee Application Reviews Using Support Vector Machine,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 15, no. 02, p. 99, Jan. 2025, doi: https://doi.org/10.24843/LKJITI.2024.v15.i02.p03.

R. Fajriah and D. Kurniawan, “Optimalisasi Model Klasifikasi Naïve Bayes dan Support Vector Machine dengan FastText dan Chi-Square,” Faktor Exacta, vol. 17, no. 4, pp. 1979–276, 2024, doi: 10.30998/faktorexacta.v17i4.24751.

F. J. Damanik and D. B. Setyohadi, “Analysis of Public Sentiment About COVID-19 in Indonesia on Twitter Using Multinomial Naïve Bayes and Support Vector Machine,” in IOP Conference Series: Earth and Environmental Science, IOP Publishing Ltd, Apr. 2021. doi: 10.1088/1755-1315/704/1/012027.

N. E. Febriyanty, M. A. Hariyadi, and C. Crysdian, “Hoax Detection News Using Naïve Bayes and Support Vector Machine Algorithm,” IJADIS : International Journal of Advances in Data and Information Systems, vol. 4, no. 2, pp. 191–200, Oct. 2023, doi: https://doi.org/10.25008/ijadis.v4i2.1306.

Additional Files

Published

2025-06-23

How to Cite

[1]
M. N. . Muhadzib Al-Faruq, W. D. . Yuniarti, M. R. . Handayani, and K. . Umam, “Implementation of Enhanced Confix Stripping Stemming and Chi-Squared Feature Selection on Classification UIN Walisongo Website with Naïve Bayes Classifier”, J. Tek. Inform. (JUTIF), vol. 6, no. 3, pp. 1279–1297, Jun. 2025.