ENHANCING SENTIMENT ANALYSIS WITH CHATBOTS: A COMPARATIVE STUDY OF TEXT PRE-PROCESSING

  • Indri Tri Julianto Department of Computer Science, Institut Teknologi Garut, Indonesia
  • Dede Kurniadi Department of Computer Science, Institut Teknologi Garut, Indonesia
  • Benedicto B. Balilo Jr CS/IT Department, College of Science, Bicol University, Legazpi City, Philippines
Keywords: altenative, Chat GPT-3.5 Open AI, google bard, sentiment analysis, text pre-processing

Abstract

Text pre-processing plays a crucial role in the Sentiment Analysis process. Machine Learning models like Chat GPT-3.5 by OpenAI and Google Bard serve as alternative methods for text pre-processing. This study aims to evaluate the capabilities of both Chatbots in the text pre-processing stage while assessing their performance using a dataset obtained by crawling from source X. The study involves a comparison of Chat GPT-3.5 and Google Bard using Decision Tree and Naïve Bayes algorithms. The validation process employs K-Fold Cross Validation with a K value of 10. Additionally, three sampling methods, namely Linear, Shuffled, and Stratified Sampling, are utilized. The findings reveal that Chat GPT-3.5 performs best when using the Decision Tree algorithm with a K-Fold Cross value of 10, and employing Stratified Sampling, achieving an Accuracy of 90.68%, Precision of 90.63%, and Recall of 100%. On the other hand, Google Bard's optimal performance is achieved with the Decision Tree algorithm, a K-Fold Cross value of 10, and Shuffled Sampling, resulting in an Accuracy of 74.00%, Precision of 72.73%, and Recall of 98.77%. The study concludes that Chat GPT-3.5 and Google Bard are viable alternatives for text pre-processing in Sentiment Analysis. Performance measurements indicate that Chat GPT-3.5 outperforms Google Bard, achieving an Accuracy of 90.68%, Precision of 90.63%, and Recall of 100%. These results were validated by comparing them to human annotations, which achieved an accuracy score of 85.20%, Precision of 85.71%, and Recall of 99.03% when using the Decision Tree algorithm with a K-Fold Cross value of 10 and employing Stratified Sampling. This suggests that Chat GPT-3.5's text pre-processing performance is on par with human annotations.

Downloads

Download data is not yet available.

References

L. Hermawan and M. B. Ismiati, “Pembelajaran Text Preprocessing berbasis Simulator Untuk Mata Kuliah Information Retrieval,” Transformatika, vol. 17, no. 2, pp. 188–199, 2020.

S. K. Assayed, K. Shaalan, M. Alkhatib, and S. Maghaydah, “Machine Learning Chatbot for Sentiment Analysis of Covid-19 Tweets,” in Computer Networks & Communications, Feb. 2023, pp. 41–55, doi: 10.5121/csit.2023.130404.

B. Kurniawan Rachmat, A. Suwarisman, I. Afriyanti, A. Wahyudi, and D. D. Saputra, “Analisis Sentimen Complain dan Bukan Complain pada Twitter Telkomsel dengan SMOTE dan Naïve Bayes,” J. Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 107–113, 2023, [Online]. Available: https://doi.org/10.35870/jti.

M. Fahmi, Y. Yuningsih, and A. Puspita, “Sentiment Analysis Of Online Gojek Transportation Services On Twitter Using The Naïve Bayes Method,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 8, no. 2, pp. 84–90, 2023, doi: 10.33480/jitk.v8i2.4004.

Alfandi Safira and F. N. Hasan, “Analisis Sentimen Masyarakat Terhadap Paylater Menggunakan Metode Naive Bayes Classifier,” Zo. J. Sist. Inf., vol. 5, no. 1, pp. 59–70, 2023, doi: 10.31849/zn.v5i1.12856.

A. P. Nardilasari, A. L. Hananto, S. S. Hilabi, and B. Priyatna, “Analisis Sentimen Calon Presiden 2024 Menggunakan Algoritma SVM,” JOINTECS (Journal Inf. Technol. Comput. Sci., vol. 7, no. 1, pp. 11–18, 2022.

R. Parlika, S. I. Pradika, A. M. Hakim, and K. R. N. M, “Analisis Sentimen Twitter Terhadap Bitcoin dan Cryptocurrency Berbasis Python TextBlob,” J. Ilm. Teknol. Inf. dan Robot., vol. 2, no. 2, pp. 33–37, 2020.

M. Sarosa, M. Kusumawardani, A. Suyono, and Z. Sari, “Implementasi Chatbot Pembelajaran Bahasa Inggris menggunakan Media Sosial,” J. Edukasi dan Penelit. Inform., vol. 6, no. 3, p. 317, 2020, doi: 10.26418/jp.v6i3.43191.

M. Dowling and B. Lucey, “ChatGPT for (Finance) research: The Bananarama Conjecture,” Financ. Res. Lett., no. 103662, pp. 1–20, 2023, doi: 10.1016/j.frl.2023.103662.

I. T. Julianto, D. Kurniadi, Y. Septiana, and A. Sutedi, “Alternative Text Pre-Processing using Chat GPT-3.5 Open AI-3.5 Open AI,” Janapati, vol. 12, no. 1, pp. 67–77, 2023, [Online]. Available: https://wjaets.com/content/artificial-intelligence-ai-based-chatbot-study-chatgpt-google-ai-bard-and-baidu-ai.

M. S. Rahaman, M. M. T. Ahsan, N. Anjum, M. M. Rahman, and M. N. Rahman, “The AI Race is on! Google’s Bard and Openai’s Chatgpt Head to Head: An Opinion Article,” SSRN Electron. J., 2023, doi: 10.2139/ssrn.4351785.

Google, “An important next step on our AI journey,” google.blog, 2023. https://blog.google/technology/ai/bard-google-ai-search-updates/ (accessed May 27, 2023).

V. Maslej-Krešňáková, M. Sarnovský, P. Butka, and K. Machová, “Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification,” Appl. Sci., vol. 10, no. 23, pp. 1–26, 2020, doi: 10.3390/app10238631.

M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol. 12, no. 17, pp. 1–21, 2022, doi: 10.3390/app12178765.

N. Garg and K. Sharma, “Text pre-processing of multilingual for sentiment analysis based on social network data,” Int. J. Electr. Comput. Eng., vol. 12, no. 1, pp. 776–784, 2022, doi: 10.11591/ijece.v12i1.pp776-784.

V. V. Nhlabano and P. E. N. Lutu, “Impact of Text Pre-processing on the Performance of Sentiment Analysis Models for Social Media Data,” 2018 Int. Conf. Adv. Big Data, Comput. Data Commun. Syst., pp. 1–6, 2018.

L. G. Irham, A. Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,” J. Media Inform. Budidarma, vol. 3, no. 4, p. 284, 2019, doi: 10.30865/mib.v3i4.1410.

C. Cahyaningtyas, Y. Nataliani, and I. R. Widiasari, “Analisis sentimen pada rating aplikasi Shopee menggunakan metode Decision Tree berbasis SMOTE,” J. Teknol. Inf., vol. 18, no. 2, pp. 173–184, 2021.

Aditya Quantano Surbakti, Regiolina Hayami, and Januar Al Amien, “Analisa Tanggapan Terhadap PSBB Di Indonesia Dengan Algoritma Decision Tree Pada Twitter,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 2, no. 2, pp. 91–97, 2021, doi: 10.37859/coscitech.v2i2.2851.

O. P. Zusrotun, A. C. Murti, and R. Fiati, “Sentimen Analisis Belajar Online Di Twitter Menggunakan Naïve Bayes,” JANAPATI, vol. 11, no. 3, pp. 310–320, 2022.

A. H. Anshor and A. Safuwan, “Analisis Sentimen Opini Warganet Twitter Terhadap Tes Screening Genose Pendeteksi Virus Covid-19 Menggunakan Metode Naïve Bayes Berbasis Particle Swarm Optimization,” JINTEKS (Jurnal Inform. Teknol. dan Sains), vol. 5, no. 1, pp. 170–178, 2023.

F. Syah, H. Fajrin, A. N. Afif, M. R. Saeputra, D. Mirranty, and D. D. Saputra, “Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization,” urnal JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 54–58, 2023.

M. R. Qisthiano, I. Ruswita, and P. Armilia, “Implementasi Metode SVM dalam Analisis Sentimen Mengenai Vaksin dengan Menggunakan Python 3,” J. Ilm. Sist. Inf., vol. 13, no. 1, pp. 1–7, 2023.

D. Setiyawati and N. Cahyono, “Analisa Sentimen Pengguna Sosial Media Twitter Terhadap Perokok di Indonesia,” Indones. J. Comput. Sci., vol. 12, no. 1, pp. 262–272, 2023.

M. T. Anwar, D. Riandhita, A. Permana, P. Sistem, I. Industri, and J. Pusat, “Analisis Sentimen Masyarakat Indonesia Terhadap Produk Kendaraan Listrik Menggunakan VADER,” J. Tek. Inform. dan Sist. Inf., vol. 10, no. 1, pp. 783–792, 2023.

E. Febriyani and H. Februariyanti, “Analisis Sentimen Terhadap Program Kampus Merdeka Menggunakan Algoritma Naive Bayes Classifier Di Twitter,” J. TEKNO KOMPAK, vol. 17, no. 1, pp. 25–38, 2022.

A. Fauzy, Metode Sampling. Tanggerang Selatan: Universitas Terbuka, 2019.

A. A. Abdillah, A. Azwardi, and I. Wahyudi, “Optimasi Linear Sampling dan Information Gain pada Algoritma Decision Tree untuk Diagnosis Penyakit Diabetes,” Multinetics, vol. 7, no. 1, pp. 21–29, 2021.

A. Bisri and M. Man, “Machine Learning Algorithms Based on Sampling Techniques for Raisin Grains Classification,” Int. J. Informatics Vis., vol. 7, no. 1, pp. 7–14, 2023, doi: 10.30630/joiv.7.1.970.

L. Vincent, H. A. Indahsi, and D. Suryawinata, “Klasifikasi Jenis-Jenis Anjing Menggunakan GoogleNet,” J. Mach. Learn. Comput. Intell., vol. 2, no. 1, pp. 5–8, 2023.

J. Amalia, N. Yosevin Nababan, K. G. Tambunan, and I. S. Sinaga, “Decision Tree Dengan Binary Bat Algoruthm Optimization Pada Heart Catheterization Prediction,” Hexag. J. Tek. dan Sains, vol. 3, no. 2, pp. 46–51, 2022, doi: 10.36761/hexagon.v3i2.1640.

A. A. Arifiyanti, M. F. Pandji, and B. Utomo, “Analisis Sentimen Ulasan Pengunjung Objek Wisata Gunung Bromo pada Situs Tripadvisor,” Explor. J. Sist. Inf. dan Telemat., vol. 13, no. 1, p. 32, 2022, doi: 10.36448/jsit.v13i1.2539.

F. S. Pamungkas and I. Kharisudin, “Analisis Sentimen dengan SVM , NAIVE BAYES dan KNN untuk Studi Tanggapan Masyarakat Indonesia Terhadap Pandemi Covid-19 pada Media Sosial Twitter,” Prisma, vol. 4, pp. 628–634, 2021.

S. R. Cholil, T. Handayani, R. Prathivi, and T. Ardianita, “Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 2, pp. 118–127, 2021.

M. Siddik, H. Hendri, R. N. Putri, Y. Desnelita, and G. Gustientiedina, “Klasifikasi Kepuasan Mahasiswa Terhadap Pelayanan Perguruan Tinggi Menggunakan Algoritma Naïve Bayes,” INTECOMS J. Inf. Technol. Comput. Sci., vol. 3, no. 2, pp. 162–166, 2020, doi: 10.31539/intecoms.v3i2.1654.

D. S. Utami and A. Erfina, “Analisis Sentimen Pinjaman Online di Twitter Menggunakan Algoritma Support Vector Machine (SVM),” SISMATIK (Seminar Nas. Sist. Inf. dan Manaj. Inform., vol. 1, no. 1, pp. 299–305, 2021.

N. Fitriyah, B. Warsito, and D. A. I. Maruddani, “Analisis Sentimen Gojek Pada Media Sosial Twitter Dengan Klasifikasi Support Vector Machine (SVM),” J. Gaussian, vol. 9, no. 3, pp. 376–390, 2020, doi: 10.14710/j.gauss.v9i3.28932.

M. Dennis, F. Zoromi, and M. K. Anam, “Penerapan Algoritma Naïve Bayes Untuk Pengelompokkan Predikat Peserta Uji Kemahiran Berbahasa Indonesia,” J. Media Inform. Budidarma, vol. 6, no. 2, pp. 1183–1190, 2022, doi: 10.30865/mib.v6i2.3956.

Junadhi, Agustin, M. Rifqi, and M. K. Anam, “Sentiment Analysis Of Online Lectures Using K-Nearest Neighbors Based On Feature Selection,” Janapati, vol. 11, no. 3, pp. 216–225, 2022.

I. T. Julianto, D. Kurniadi, M. R. Nashrulloh, and A. Mulyani, “Comparison Of Classification Algorithm And Feature Selection in Bitcoin Sentiment Analysis,” JUTIF, vol. 3, no. 3, pp. 739–744, 2022.

M. Madine, K. Salah, R. Jayaraman, A. Battah, H. Hasan, and I. Yaqoob, “Blockchain and NFTs for Time-Bound Access and Monetization of Private Data,” IEEE Access, vol. 10, pp. 94186–94202, 2022, doi: 10.1109/ACCESS.2022.3204274.

A. K. Fauziyyah and D. H. Gautama, “Analisis Sentimen Pandemi Covid19 Pada Streaming Twitter Dengan Text Mining Python,” J. Ilm. SINUS, vol. 18, no. 2, pp. 31–42, 2020, doi: 10.30646/sinus.v18i2.491.

N. Alnuaimi, A. Almemari, M. Madine, K. Salah, H. Al Breiki, and R. Jayaraman, “NFT Certificates and Proof of Delivery for Fine Jewelry and Gemstones,” IEEE Access, vol. 10, pp. 101263–101275, 2022, doi: 10.1109/ACCESS.2022.3208698.

A. E. Budiman and A. Widjaja, “Analisis Pengaruh Teks Preprocessing Terhadap Deteksi Plagiarisme Pada Dokumen Tugas Akhir,” J. Tek. Inform. dan Sist. Inf., vol. 6, no. 3, pp. 475–488, 2020, doi: 10.28932/jutisi.v6i3.2892.

L. K. Harsono, Y. Alkhalifi, Nurajijah, and W. Gata, “Analisis Sentimen Stakeholder atas Layanan haiDJPb pada Media Sosial Twitter Dengan Menggunakan Metode Support Vector Machine dan Naïve Bayes,” J. Ilmu-ilmu Inform. dan Manaj., vol. 14, no. 1, pp. 36–44, 2020.

A. Ahmad and W. Gata, “Sentimen Analisis Masyarakat Indonesia di Twitter Terkait Metaverse dengan Algoritma Support Vector Machine,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 6, no. 4, pp. 548–555, 2022, doi: 10.35870/jtik.v6i4.569.

A. C. Herlingga, I. P. E. Prismana, D. R. Prehanto, and D. A. Dermawan, “Algoritma Stemming Nazief & Adriani dengan Metode Cosine Similarity untuk Chatbot Telegram Terintegrasi dengan E-layanan,” J. Informatics Comput. Sci., vol. 2, no. 01, pp. 19–26, 2020, doi: 10.26740/jinacs.v2n01.p19-26.

M. K. Insan, U. Hayati, and O. Nurdiawan, “Analisis Sentimen Aplikasi Brimo Pada Ulasan Pengguna Di Google Play Menggunakan Algoritma Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, pp. 478–483, 2023.

G. Feng, M. Fan, and Y. Chen, “Analysis and Prediction of Students’ Academic Performance Based on Educational Data Mining,” IEEE Access, vol. 10, pp. 19558–19571, 2022, doi: 10.1109/ACCESS.2022.3151652.

I. T. Julianto, D. Kurniadi, and F. M. Khoiriyyah, “Price Prediction of Non-Fungible Tokens (NFTs) using Data Mining Prediction Algorithm,” in 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), Feb. 2023, pp. 633–637, doi: 10.1109/ICCoSITE57641.2023.10127679.

Yunitasari, H. S. Hopipah, and R. Mayasari, “Optimasi Backward Elimination untuk Klasifikasi Kepuasan Pelanggan Menggunakan Algoritme k-nearest neighbor (k-NN) and Naive Bayes,” Technomedia J., vol. 6, no. 1, pp. 99–110, 2021, doi: 10.33050/tmj.v6i1.1531.

D. Nurlaela, “Penerapan Adaboost untuk Meningkatkan Akurasi Naive Bayes Pada Prediksi Pendapatan Penjualan Film,” Inti Nusa Mandiri, vol. 14, no. 2, pp. 181–188, 2020.

Published
2023-12-23
How to Cite
[1]
Indri Tri Julianto, D. Kurniadi, and B. B. Balilo Jr, “ENHANCING SENTIMENT ANALYSIS WITH CHATBOTS: A COMPARATIVE STUDY OF TEXT PRE-PROCESSING”, J. Tek. Inform. (JUTIF), vol. 4, no. 6, pp. 1419-1430, Dec. 2023.