Comparative Evaluation Of Sparse, Dense, And Hybrid Retrieval Models On Indonesian Wikipedia

Tino Saputra; Eric Julianto; Ari  Widjonarko; Budi  Tjahjono

doi:10.52436/1.jutif.2026.7.3.5776

Authors

Tino Saputra Magister Computer,Computer Science, Esa Unggul University, Jakarta, Indonesia
Eric Julianto Magister Computer,Computer Science, Esa Unggul University, Jakarta, Indonesia
Ari Widjonarko Master Of Data Science, Data Science and AI, Monash University, Melbourne, Australia
Budi Tjahjono Computer Science, Esa Unggul University, Jakarta, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.3.5776

Keywords:

Information Retrieval, BM25, TF-IDF, Hybrid Retrieval, Wikipedia Indonesia

Abstract

This study presents a comparative evaluation of Information Retrieval (IR) models on the Indonesian Wikipedia corpus, focusing on sparse, dense, and hybrid retrieval approaches. The evaluated methods include TF-IDF and BM25 as sparse models, SBERT (MiniLM) as a dense retrieval model, and hybrid retrieval implemented through score fusion. The dataset consists of 713,044 Wikipedia articles, with experiments conducted using 1,000 test queries. Performance is measured using Precision@10 (P@10) and Mean Reciprocal Rank (MRR). The results show that BM25 achieves the highest performance, with a P@10 of 0.973 and an MRR of 0.9174, significantly outperforming TF-IDF and SBERT. Hybrid retrieval provides a slight performance improvement, where the BM25 + SBERT combination reaches a P@10 of 0.979 and an MRR of 0.9253 at higher α values. These findings indicate that lexical matching remains dominant in encyclopedic corpora, while semantic representations provide complementary improvements. However, the performance gain of hybrid retrieval is relatively marginal compared to the additional computational cost introduced by dense embedding and score fusion processes, indicating a trade-off between effectiveness and efficiency. These results highlight that, for low-resource languages such as Indonesian, lexical-based retrieval remains highly reliable, while hybrid approaches provide incremental improvements. Therefore, this study provides practical guidelines for developing efficient, scalable, and reliable Information Retrieval systems for Indonesian Wikipedia and other low-resource language corpora.

Downloads

Download data is not yet available.

References

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019. doi: 10.18653/v1/D19-1410.

G. Sidiropoulos, N. Voskarides, S. Vakulenko, and E. Kanoulas, “Combining Lexical and Dense Retrieval for Computationally Efficient Multi-hop Question Answering Georgios,” Assoc. Comput. Linguist., pp. 1–6, 2021, doi: 10.18653/v1/2021.sustainlp-1.7.

N. Arabzadeh, H. Zamani, and others, “Predicting Efficiency/Effectiveness Trade-offs for Dense vs. Sparse Retrieval Strategy Selection,” in Proceedings of the 2021 ACM International Conference on Information and Knowledge Management (CIKM), 2021. doi: 10.1145/3459637.3482159.

D. Metzler, W. B. Croft, and A. M. Diaz, “Combining Different Retrieval Models Using Score Fusion,” Inf. Retr. Boston., vol. 12, no. 1, pp. 1–27, 2009, doi: 10.1007/s10791-008-9061-6.

J. Lu, J. Ma, and K. Hall, “Zero-shot Hybrid Retrieval and Reranking Models for Biomedical Literature,” in CEUR Workshop Proceedings, 2022.

J. Seo, “Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions,” Mathematics, vol. 10, no. 8, p. 1335, 2022.

Z. Tu and S. J. Padmanabhan, “MIA 2022 Shared Task Submission : Leveraging Entity Representations , Dense-Sparse Hybrids , and Fusion-in-Decoder for Cross-Lingual Question Answering,” Assoc. Comput. Linguist., vol. Proceeding, pp. 100–107, 2022, doi: 10.18653/v1/2022.mia-1.10.

L. Xu, Z. Su, M. Yu, J. Li, F. Meng, and Z. Jie, “Dense Retrievers Can Fail on Simple Queries : Revealing The Granularity Dilemma of Embeddings,” Assoc. Comput. Linguist., pp. 19295–19305, 2025, doi: 10.18653/v1/2025.findings-emnlp.1051.

E. N. Azizah and A. N. Handayani, “Permodelan pada Information Retrieval: Literature Review,” J. Inov. Teknol. dan Edukasi Tek., vol. 2, no. 11, pp. 527–535, 2022, doi: 10.17977/um068v2i112022p527-535.

R. Rudiansyah, R. Wahyuni, and M. Andri, “Search Engine Menggunakan Metode Information Retrieval,” J. SANTI, vol. 2, no. 1, pp. 21–30, 2022.

A. S. Ekakristi, A. F. Wicaksono, and R. Mahendra, “Intermediate-task Transfer Learning for Indonesian NLP Tasks,” Nat. Lang. Process. J., 2025, doi: 10.1016/j.nlp.2025.100161.

R. H. Gusmita, A. F. Firmansyah, H. M. Zahera, and A.-C. Ngonga Ngomo, “ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian,” Data Knowl. Eng., vol. 161, 2026, doi: 10.1016/j.datak.2025.102504.

H. Zamani, M. Dehghani, W. B. Croft, and M. Bendersky, “Neural Information Retrieval: A Survey,” ACM Trans. Inf. Syst., vol. 40, no. 2, 2022, doi: 10.1145/3486250.

Z. Xu, Z. Dou, J.-R. Wen, and R. Zhang, “A Survey of Model Architectures in Information Retrieval,” arXiv Prepr., 2025.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of NAACL-HLT, 2019. doi: 10.18653/v1/N19-1423.

N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models,” 2021. doi: 10.48550/arXiv.2104.08663.

J. Lin, “The Neural Hype and Comparisons Against Weak Baselines,” SIGIR Forum, vol. 52, no. 2, pp. 40–51, 2019, doi: 10.1145/3308774.3308778.

V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. doi: 10.18653/v1/2020.emnlp-main.550.

O. Khattab and M. Zaharia, “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT,” in Proceedings of the International ACM SIGIR Conference, 2020. doi: 10.1145/3397271.3401075.

N. Craswell, B. Mitra, E. Yilmaz, and D. Campos, “Overview of the TREC 2020 Deep Learning Track,” in Proceedings of TREC, 2020.

S. Marchesin, A. Purpura, F. Silvestri, R. Perego, and G. Faggioli, “Focal Elements of Neural Information Retrieval Models: An Outlook through a Reproducibility Study,” Inf. Process. & Manag., 2020, doi: 10.1016/j.ipm.2020.102201.

L. Gao, Z. Dai, T. Chen, Z. Fan, B. Van Durme, and J. Callan, “Complement Lexical Retrieval Model with Semantic Residual Embeddings,” in Advances in Information Retrieval (ECIR 2021), 2021. doi: 10.1007/978-3-030-72240-1_11.

I. G. N. A. Jayarana, I. G. W. Darma, I. W. A. Juliantara, and I. M. A. W. Putra, “Study Literatur Information Retrieval Model: Teknik dan Aplikasi,” J. Sutasoma, vol. 3, no. 1, pp. 61–69, 2025, doi: 10.58878/sutasoma.v3i2.392.

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.

C. D. Manning, P. Raghavan, and H. Schuetze, Introduction to Information Retrieval. Cambridge University Press, 2008.

S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,” Found. Trends Inf. Retr., vol. 3, no. 4, pp. 333–389, 2009, doi: 10.1561/1500000019.

K. Sparck Jones, S. Walker, and S. E. Robertson, “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments,” Inf. Process. Manag., vol. 36, no. 6, pp. 779–808, 2000, doi: 10.1016/S0306-4573(00)00015-7.

S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “On the Robustness of Neural Ranking Models Across Domains,” ACM Trans. Inf. Syst., 2022, doi: 10.1145/3512345.