Development of a Hybrid Machine Learning-Based E-Commerce Chatbot Using Jaccard Similarity and K-Nearest Neighbor for Accurate Intent Classification
DOI:
https://doi.org/10.52436/1.jutif.2026.7.3.5659Keywords:
Chatbot, E-Commerce, Hybrid Machine Learning, Jaccard Similarity, K-Nearest NeighborAbstract
The advancement of technology in the e-commerce industry requires fast and accurate information services, particularly through the use of Natural Language Processing (NLP)-based chatbots. However, many existing chatbots rely on a single method, which often limits their ability to understand user question contexts effectively. This study proposes a hybrid approach integrating Jaccard Similarity and K-Nearest Neighbor (K-NN) to improve answer retrieval accuracy and intent classification in e-commerce chatbot systems. Jaccard Similarity is employed to measure the similarity between user queries and Frequently Asked Questions (FAQ) data, while K-NN is used to determine intent based on the nearest neighbor with the highest similarity values. The dataset, consisting of FAQ questions and answers, is preprocessed through case folding, tokenization, stopword removal, and stemming. System performance is evaluated using accuracy, precision, recall, and F1-score metrics. The experimental results show that Jaccard Similarity effectively selects relevant answer candidates, achieving similarity values of up to 66%, while K-NN produces stable intent classification results. The proposed hybrid model achieved an accuracy of 87%, precision of 86%, recall of 85%, and an F1-score of 85%, outperforming single-method implementations. Furthermore, confidence score analysis indicates that most chatbot responses fall into the high confidence category (>0.70). Rule-based NLP evaluation also provides insights into unclassified inputs, which can be used as a basis for future dataset development. The implementation results demonstrate that the chatbot system can be operated effectively on both customer and admin sides and monitored through analytical features. Overall, the proposed hybrid approach enhances the reliability, relevance, and stability of chatbot responses, making it a practical and effective solution for real-time intent classification and FAQ retrieval in e-commerce customer service environments.
Downloads
References
H. A. Mumtahana and R. Pamungkas, “Architecture Design Development of e-Learning for Primary School Learning in Madiun City,” INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 6, no. 2, pp. 285–292, Aug. 2022, doi: 10.29407/intensif.v6i2.18276.
N. R. P. Syallya, A. A. Pravitasari, and A. Helen, “NLP-Based Intent Classification Model for Academic Curriculum Chatbots in Universities Study Programs,” Jurnal RESTI, vol. 9, no. 1, pp. 111–117, Feb. 2025, doi: 10.29207/resti.v9i1.6276.
A. P. Segara, M. Andryan, W. Saputra, and N. A. Ranggianto, “Perbandingan Performa Algoritma Random Tree, K-NN, dan A-NN untuk Deteksi Serangan DDoS pada Software Defined Network (SDN) Article Info ABSTRAK,” JSAI : Journal Scientific and Applied Informatics, vol. 08, no. 2, 2025, doi: 10.36085/jsai.
V. Agustina and A. Herliana, “Analisis Sentimen Publik atas Kebijakan Efisiensi Anggaran 2025 dengan Text Mining dan Natural Language Processing, Jurnal Media Informatika (JUMIN),” JURNAL MEDIA INFORMATIKA [JUMIN], vol. 6, no. 3, pp. 2182–2194, 2025, doi: doi.org/10.55338/jumin.v6i3.6301.
A. W. Saputra, H. Setiawan, and R. Dijaya, “Analisis Sentimen Layanan Perwalian Mahasiswa UMSIDA Menggunakan Metode Support Vector Machine (SVM),” JSAI: Journal Scientific and Applied Informatics, vol. 8, no. 1, 2025, doi: 10.36085/jsai.v8i1.7387.
B. Wijaya, “Implementasi Chatbot Dalam Sistem E-Commerce Menggunakan Natural Language Processing Dengan Metode Extreme Programming,” Jurnal Ilmiah Komputasi, vol. 24, no. 2, Jun. 2025, doi: 10.32409/jikstik.24.2.3806.
H. Y. Panjaitan, Y. Masnita, and K. Kurniawati, “Penambahan Variabel Tingkat Kecerdasan dari Chatbot untuk Mempengaruhi Kepercayaan Pengguna dalam Aplikasi Telekonsultasi Kesehatan,” J. Sistem Info. Bisnis, vol. 13, no. 1, pp. 78–87, Sep. 2023, doi: 10.21456/vol13iss1pp78-87.
M. R. Waskito, A. D. Rahajoe, and A. L. Nurlaili, “Implementasi Metode Collaborative Filtering Menggunakan Algoritma Cosine Similarity Dan Jaccard Similarity Pada Sistem E-Commerce,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, Oct. 2024, doi: 10.23960/jitet.v12i3s1.5315.
Z. H. Pradana, H. Nafi’ah, and R. A. Rochmanto, “Chatbot-based Information Service using RASA Open-Source Framework in Prambanan Temple Tourism Object,” Jurnal RESTI, vol. 6, no. 4, pp. 656–662, Aug. 2022, doi: 10.29207/resti.v6i4.3913.
M. Khusnah, R. Gernowo, and B. Surarso, “Implementasi E-Commerce dengan Sistem Informasi Rekomendasi menggunakan Metode Collaborative Filtering untuk Pengembangan Penjualan pada UMKM,” Jurnal Sistem Informasi Bisnis, vol. 15, no. 1, pp. 134–141, 2025, doi: 10.14710/vol15iss1pp135-142.
R. S. Nurhalizah, R. Ardianto, and P. Purwono, “Analisis Supervised dan Unsupervised Learning pada Machine Learning: Systematic Literature Review,” Jurnal Ilmu Komputer dan Informatika, vol. 4, no. 1, pp. 61–72, Aug. 2024, doi: 10.54082/jiki.168.
J. A. Firdaus, A. Setia Budi, and E. Setiawan, “Analisis Performa Algoritma Machine Learning Pada Perangkat Embedded Atmega328p,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) , vol. 10, no. 2, pp. 245–254, 2023, doi: 10.25126/jtiik.2023106196.
A. A. Kurniawan and M. Mustikasari, “Evaluasi Kinerja Mllib Apache Spark Pada Klasifikasi Berita Palsu Dalam Bahasa Indonesia,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 9, no. 3, 2022, doi: 10.25126/jtiik.202293538.
Y. Imanuel, S. J. Pinastika, H. Soepandi, and K. Lebai, “Analisis Komparatif Jaccard Dan Cosine Similarity Untuk Pencarian Ayat Alkitab,” SINTA Jurnal Sistem Informasi dan Teknologi Komputasi, vol. 2, no. 4, pp. 171–177, 2025, doi: https://doi.org/10.61124/sinta.v2i4.102.
Sujacka Retno, Rozzi Kesuma Dinata, and Novia Hasdyna, “Evaluasi model data chatbot dalam natural language processing menggunakan k-nearest neighbor,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 4, no. 1, pp. 146–153, Apr. 2023, doi: 10.37859/coscitech.v4i1.4690.
D. A. Prabowo and S. Sudianto, “Analisis Sentimen Sepak Bola Indonesia pada Twitter menggunakan K-Nearest Neighbors dan Random Forest,” JSAI : Journal Scientific and Applied Informatics, vol. 06, no. 02, 2023, doi: 10.36085/jsai.v6i2.5337.
M. M. Amin, A. Firdaus, and Y. Dwitayanti, “Model Rekomendasi Jurnal dengan Algoritma Jaccard Similarity dan Protokol OAI-PMH,” Jurnal Pendidikan dan Teknologi Indonesia, vol. 4, no. 10, pp. 489–499, Feb. 2025, doi: 10.52436/1.jpti.637.
S. Sandiwarno, T. B. A. Hartanto, and E. Pitaloka, “Penerapan Machine Learning Untuk Prediksi Bencana Banjir,” Jurnal Sistem Informasi Bisnis, vol. 14, no. 1, pp. 62–76, Jan. 2024, doi: 10.21456/vol14iss1pp62-76.
A. Tarisa Akbar, N. Yudistira, and A. Ridok, “Identifikasi Gagal Ginjal Kronis Dengan Mengimplementasikan Metode Support Vector Machine Beserta K-Nearest Neighbour (SVM-KNN),” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) , vol. 10, no. 2, pp. 301–308, 2023, doi: 10.25126/jtiik.2023106059.
C. D. Sasongko, R. Isnanto, and A. P. Widodo, “Review of Systematic Literature about Sentiment Analysis Techniques,” Jurnal Sistem Informasi Bisnis, vol. 15, no. 2, pp. 227–236, Jun. 2025, doi: 10.14710/vol15iss2pp227-236.
H. Handoko, A. Asrofiq, J. Junadhi, and A. S. Negara, “Sentiment Analysis of Sirekap Tweets Using CNN Algorithm,” INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 8, no. 2, pp. 312–329, Aug. 2024, doi: 10.29407/intensif.v8i2.23046.
N. B. Puspitasari, A. Noventi, A. B. Ginting, A. A. Fransiskus, and D. Saribu, “Bot Innovation Realizing Service Excellence: Designing a WhatsApp Chatbot as a Customer Service Solution Using the Waterfall Method,” Jurnal Sistem Informasi Bisnis, vol. 15, no. 2, pp. 251–259, 2025, doi: 10.14710/vol15iss2pp251-259.
L. Anindyati, “Analisis Dan Perancangan Aplikasi Chatbot Menggunkan Framework Rasa Dan Sistem Informasi Pemeliharaan Aplikasi (Studi Kasus: Chatbot Penerimaan Mahasiswa Baru Politeknik Astra),” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 10, no. 2, pp. 291–300, Apr. 2022, doi: 10.25126/jtiik.2022106409.
A. R. Sembiring and C. K. Dewa, “Sentiment Analysis On Indonesian Tweets about the 2024 Election,” Sinkron, vol. 9, no. 1, pp. 413–422, Jan. 2025, doi: 10.33395/sinkron.v9i1.14481.
F. Zamakhsyari et al., “Comparison of KNN and Random Forest Algorithms on E-Commerce Service Chatbot,” Jurnal Informatika Sunan Kalijaga (JISKA), vol. 10, no. 1, pp. 100–109, 2025, doi.org/10.14421/jiska.2025.10.1.100-109.
V. R. Prasetyo, M. F. Naufal, and K. Wijaya, “Sentiment Analysis of ChatGPT on Indonesian Text using Hybrid CNN and Bi-LSTM,” Jurnal RESTI, vol. 9, no. 2, pp. 327–333, Apr. 2025, doi: 10.29207/resti.v9i2.6334.
B. Kommey, E. Tamakloe, D. Opoku, T. Crispin, and J. Danquah, “Disease Detection in Tropical Tomato Leaves via Machine Learning Models,” Jurnal ELTIKOM, vol. 8, no. 2, pp. 179–191, Dec. 2024, doi: 10.31961/eltikom.v8i2.1340.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Andrian Sah, Andi Ilham, Rasna, Siti Nurhayati

This work is licensed under a Creative Commons Attribution 4.0 International License.





