Cybersecurity Risk Detection Based on Roblox User Review Analysis Using TF-IDF and Comparison of Naïve Bayes and Support Vector Machine

RG Guntur Alam; Huda  Ibrahim

doi:10.52436/1.jutif.2026.7.2.5582

Authors

RG Guntur Alam Information System, Universitas Muhammadiyah Bengkulu, Indonesia
Huda Ibrahim School of Computing, Universiti Utara Malaysia, Malaysia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5582

Keywords:

Cybersecurity Risk Detection, Naïve Bayes, Roblox, Sentiment Analysis, Support Vector Machine, User Reviews

Abstract

The rapid growth of online gaming platforms increases user engagement while also exposing users to technical and cybersecurity risks. User reviews represent a rich yet underutilized textual source that can serve as early indicators of such risks. Unlike prior studies focused on sentiment polarity, this study positions user reviews as early cybersecurity risk signals by mapping complaint patterns into operational security risk categories relevant to system developers. This study compares Naïve Bayes (NB) and Support Vector Machine (SVM) in detecting cybersecurity risks from imbalanced textual data derived from Roblox user reviews. A total of 3,000 reviews were collected from the Google Play Store via web scraping and preprocessed using case folding, normalization, tokenization, stopword removal, and stemming. Reviews were classified into four cybersecurity risk categories (account access issues, suspicious behavior, connection instability, and data loss) based on rule-based security keyword mapping. Text representation employed TF-IDF with unigram and bigram features, while class imbalance was handled through undersampling. Model evaluation used three train–test splits (80:20, 70:30, and 60:40) and was assessed using Accuracy, Macro F1-score, AUC-PR, training time, and statistical testing. Results show that SVM consistently outperforms Naïve Bayes, achieving higher accuracy (0.86–0.88) and substantially better Macro F1-scores (0.73–0.77), indicating more balanced detection of minority cybersecurity risks. These differences are statistically significant (p < 0.05). The novelty of this study lies in transforming user reviews into a structured cybersecurity risk detection framework and empirically demonstrating the robustness of SVM in identifying rare but critical risks from imbalanced data.

Downloads

Download data is not yet available.

References

S. Esiri, “A digital innovation model for enhancing competitive gaming engagement and user experience,” Int. J. Multidiscip. Res. Growth Eval., vol. 3, no. 1, pp. 752–760, 2022, doi: https://doi.org/10.54660/.IJMRGE.2022.3.1.752-760.

S. Valluripally et al., “Detection of security and privacy attacks disrupting user immersive experience in virtual reality learning environments,” IEEE Trans. Serv. Comput., vol. 16, no. 4, pp. 2559–2574, 2022, doi: 10.1109/TSC.2022.3216539.

U. Hasanah, B. Sunarko, S. Hidayat, and R. Rachmawati, “Classification of Game Genres Based on Interaction Patterns and Popularity in the Virtual World of Roblox,” Int. J. Res. Metaverse, vol. 2, no. 3, pp. 183–194, 2025, doi: https://doi.org/10.47738/ijrm.v2i3.30.

E. M. Abdulaziz and M. A. O. Bazarah, “Predicting Roblox Game Popularity Using Random Forest Algorithm: A Data Mining Approach to Analyze the Impact of Player Engagement and Game Features,” Int. J. Res. Metaverse, 2(4), 312-332., vol. 2, no. 4, pp. 312–332, 2025, doi: https://doi.org/10.47738/ijrm.v2i4.40.

N. K. F. P. Dewi, I. G. I. Sudipa, I. W. Sunarya, N. W. J. K. Dewi, and A. S. Kusuma, “Sentiment Analysis of Roblox Game Reviews Using Support Vector Machine Method,” Sink. J. dan Penelit. Tek. Inform., vol. 9, no. 4, pp. 1863–1876, 2025, doi: https://doi.org/10.33395/sinkron.v9i4.15272.

Y. Wang et al., “Security issues in Metaverse. Metaverse communication and computing networks: applications, technologies, and approaches,” in Metaverse communication and computing networks: applications, technologies, and approaches, 2023, pp. 205–239. doi: https://doi.org/10.1002/9781394160013.ch9.

S. N. Alsubari et al., “Data analytics for the identification of fake reviews using supervised learning,” Comput. Mater. Contin., vol. 70, no. 2, pp. 3189–3204, 2022, doi: 10.32604/cmc.2022.019625.

A. Alzu’bi, O. Darwish, A. Albashayreh, and Y. Tashtoush, “Cyberattack event logs classification using deep learning with semantic feature analysis,” Comput. Secur., vol. 150, 2025, doi: https://doi.org/10.1016/j.cose.2024.104222.

S. Styawati, A. R. Isnain, N. Hendrastuty, and L. Andraini, “Comparison of Support Vector Machine and Naïve Bayes on Twitter Data Sentiment Analysis,” J. Inform. J. Pengemb. IT, vol. 6, no. 1, pp. 56–60, 2021, doi: https://doi.org/10.30591/jpit.v6i1.3245.

I. Yunanto and S. Yulianto, “Twitter Sentiment Analysis Pedulilindungi Application Using Naïve Bayes and Support Vector Machine,” urnal Tek. Inform., vol. 3, no. 4, pp. 807–814, 2022, doi: https://doi.org/10.20884/1.jutif.2022.3.4.292.

M. U. Tanveer, K. Munir, M. Amjad, S. A. J. Zaidi, A. Bermak, and A. U. Rehman, “Ensemble-Guard IoT: A Lightweight Ensemble Model for Real-Time Attack Detection on Imbalanced Dataset,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3495708.

N. A. Syam, N. Arifin, W. Firgiawan, and M. F. Rasyid, “Comparison of SVM and Gradient Boosting with PCA for Website Phising Detection,” J. Tek. Inform., vol. 6, no. 2, pp. 691–708, 2025, doi: https://doi.org/10.52436/1.jutif.2025.6.2.4344.

A. Awadallah et al., “Artificial intelligence-based cybersecurity for the metaverse: Research challenges and opportunities,” IEEE Commun. Surv. Tutorials, vol. 27, no. 2, pp. 1008–1052, 2024, doi: 10.1109/COMST.2024.3442475.

X. J. Mamakou, P. Zaharias, and M. Milesi, “Measuring customer satisfaction in electronic commerce: The impact of e-service quality and user experience,” Int. J. Qual. Reliab. Manag., vol. 41, no. 3, pp. 915–943, 2024, doi: https://doi.org/10.1108/IJQRM-07-2021-0215.

J. W. Iskandar and Y. Nataliani, “Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1120–1126, 2021, doi: https://doi.org/10.29207/resti.v5i6.3588.

A. Saputra, S. Ali, R. Subhan, and I. Sidiq, “Perbandingan Metode Naive Bayes Dan Support Vector Machine Terhadap Ulasan Aplikasi Ojol The Game,” J. Inf. Eng. Educ. Technol., vol. 8, pp. 84–89, 2024, doi: https://doi.org/10.26740/jieet.v8n2.p84-89.

R. Maheri, F. N. Salisah, F. Muttakin, and M. Megawati, “Analisis Sentimen Ulasan Aplikasi M-Paspor Menggunakan Naive Bayes Dan Support Vector Machine,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 1, pp. 448–458, 2025, doi: https://doi.org/10.29100/jipi.v10i1.5826.

S. A. Salsabila, B. Priyatna, and A. Hananto, “Komparasi Kinerja Model Naive Bayes, SVM, dan BERT dalam Klasifikasi Sentimen Ulasan Pada Aplikasi YUMMY,” STORAGE J. Ilm. Tek. dan Ilmu Komput., vol. 4, no. 2, pp. 42–47, 2025, doi: https://doi.org/10.55123/storage.v4i2.5120.

P. Ray and A. Chakrabarti, “A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis,” Appl. Comput. Informatics, vol. 18, no. 1/2, pp. 163–178, 2022, doi: https://doi.org/10.1016/j.aci.2019.02.002.

M. Shah, P. Shah, and S. Patil, “Secure and Efficient Fraud Detection Using Federated Learning and Distributed Search Databases,” in In 2025 IEEE 4th International Conference on AI in Cybersecurity (ICAIC), IEEE, 2025, pp. 1–6. doi: 10.1109/ICAIC63015.2025.10849280.

M. Jiang, Y. Liang, S. Han, K. Ma, Y. Chen, and Z. Xu, “Leveraging Generative Adversarial Networks for Addressing Data Imbalance in Financial Market Supervision,” in In Proceedings of the 2024 5th International Conference on Big Data Economy and Information Management, 2024, pp. 651–656. doi: https://doi.org/10.1145/3724154.

Y. Xie, J. Shan, L. Wei, J. Yao, and M. Zhou, “GAN-based Hybrid Sampling Method for Transaction Fraud Detection,” IEEE Trans. Knowl. Data Eng., vol. 37, pp. 5905–5918, 2025, doi: 10.1109/TKDE.2025.3589885.

M. T. Mohammed and O. F. Rashid, “Document retrieval using term term frequency inverse sentence frequency weighting scheme,” Indones. J. Electr. Eng. Comput. Sci, vol. 31, no. 3, pp. 1478–1485, 2023, doi: 10.11591/ijeecs.v31.i3.

V. Hnamte, G. Balram, and K. V. Nagendra, “Implementation of Naive Bayes Classifier for Reducing DDoS Attacks in IoT Networks,” J. Algebr. Stat., vol. 13, no. 2, pp. 2749–2757, 2022.

A. Vanacore, M. S. Pellegrino, and A. Ciardiello, “Fair evaluation of classifier predictive performance based on binary confusion matrix,” Comput. Stat., vol. 39, no. 1, pp. 363–383, 2024, doi: https://doi.org/10.1007/s00180-022-01301-9.

D. L. P. Gomes, A. Grégio, M. A. Z. Alves, and P. R. L. de Almeida, “Efficient Prequential AUC-PR Computation,” in In 2023 International Conference on Machine Learning and Applications (ICMLA), IEEE, 2023, pp. 2222–2227. doi: 10.1109/ICMLA58977.2023.00335.

M. A. A. Maldini and S. Andryana, “Analisis Sentimen Ulasan Pengguna Aplikasi Perbankan Menggunakan Algoritma Support Vector Machine Dan Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 3, pp. 4098-4105., 2025, doi: https://doi.org/10.36040/jati.v9i3.13522.

S. Sharma, J. Singh, A. Gupta, F. Ali, F. Khan, and D. Kwak, “User safety and security in the metaverse: a critical review,” IEEE Open J. Commun. Soc., vol. 5, pp. 5467–5487, 2024, doi: 10.1109/OJCOMS.2024.3397044.

M. H. O. R. Mollah, “Ai-Driven Threat Detection and Response Framework for Cloud Infrastructure Security,” Am. J. Sch. Res. Innov., vol. 4, no. 01, pp. 494–535, 2025, doi: https://doi.org/10.63125/e58hzh78.