Enhancing Cyberbullying Detection on Platform 'X' Using IndoBERT and Hybrid CNN-LSTM Model

Authors

  • Annisaa Alya Hafiza Informatics, Telkom University, Indonesia
  • Erwin Budi Setiawan Informatics, Telkom University, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.2.4321

Keywords:

Convolutional Neural Network (CNN), Cyberbullying Detection, Fasttext, IndoBERT, Long Short- Term Memory (LSTM), TF-IDF

Abstract

Cyberbullying on social media platforms has become widespread in society. Cyberbullying can take many forms, including hate speech, trolling, adult content, racism, harassment, or rants. One social media platform that has many cyberbullies is Twitter, which has been renamed 'X'. The anonymous nature of this 'X' platform allows users from all over the world to commit cyberbullying as they can freely share their thoughts and expressions without having to account for their identity. This research aims to explore the influence of IndoBERT’s semantic features on hybrid deep learning models for cyberbullying detection while integrating TF-IDF feature extraction and FastText feature expansion to enhance text classification performance. Specifically, this study examines how IndoBERT’s semantic capabilities affect the hybrid deep learning model in detecting cyberbullying on platform 'X'. This study has 30,084 tweets with a hybrid deep learning approach that combines CNN and LSTM. In the IndoBERT scenario, IndoBERT features were first combined with TF-IDF, then expanded using FastText before being applied to the hybrid deep learning model. The test results produced the highest accuracy rate by: CNN (80.69%), LSTM (80.67%), CNN- LSTM (81.18%), CNN-LSTM-IndoBERT (82.05%). This research contributes to informatics by integrating hybrid deep learning (CNN-LSTM) with IndoBERT and TF-IDF, demonstrating its effectiveness in improving cyberbullying detection in Indonesian text. Future research can explore the use of other transformer-based models such as RoBERTa or ALBERT to enhance contextual understanding in cyberbullying classification.

Downloads

Download data is not yet available.

References

R. Rosemary, A. B. Wardhana, H. M. Syam, and N. Susilawati, “The Relationship Between Anonymity and Cyber Sexual Harassment by Twitter Users: A Cross-Sectional Study,” Journal of Community Mental Health and Public Policy, vol. 6, no. 2, pp. 95–104, Apr. 2024, doi: 10.51602/cmhp.v6i2.131.

A. P. Riyadisty and E. Fauziati, “Hate Expression Found on Twitter as a Response to Meghan Markle,” Indonesian Journal of English Language Studies (IJELS), vol. 8, no. 1, pp. 45–51, Mar. 2022, doi: 10.24071/ijels.v8i1.4421.

S. Khan et al., “BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4335–4344, Jul. 2022, doi: 10.1016/j.jksuci.2022.05.006.

F. Husain and O. Uzuner, “A Survey of Offensive Language Detection for the Arabic Language,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 20, no. 12, Apr. 2021, doi: 10.1145/3421504.

A. Law School, “Legal Challenges of Cyberbullying and Online Harassment: A Comparative Analysis Shashank Mittal,” vol. 6, no. 2, Mar. 2024, doi: https://doi.org/10.36948/ijfmr.2024.v06i02.19295.

D. M. H. Kee, M. A. L. Al-Anesi, and S. A. L. Al-Anesi, “Cyberbullying on social media under the influence of COVID-19,” Global Business and Organizational Excellence, vol. 41, no. 6, pp. 11–22, Sep. 2022, doi: 10.1002/joe.22175.

G. Ray, C. D. McDermott, and M. Nicho, “Cyberbullying on Social Media: Definitions, Prevalence, and Impact Challenges,” Sep. 01, 2024, Oxford University Press. doi: 10.1093/cybsec/tyae026.

A. Candra, Wella, and A. Wicaksana, “Bidirectional encoder representations from transformers for cyberbullying text detection in indonesian social media,” International Journal of Innovative Computing, Information and Control, vol. 17, no. 5, pp. 1599–1615, Oct. 2021, doi: 10.24507/ijicic.17.05.1599.

S. Ge, L. Cheng, and H. Liu, “Improving cyberbullying detection with user interaction,” in The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021, Association for Computing Machinery, Inc, Apr. 2021, pp. 496–506. doi: 10.1145/3442381.3449828.

D. Upadhyay, H. Singhdev, and N. Mohd, “Text Classification Using CNN and CNN-LSTM,”

Webology, vol. 18, 2021, doi: 10.29121/web/v18i4/149.

I. Tabassum and V. Nunavath, “A Hybrid Deep Learning Approach for Multi-Class Cyberbullying Classification Using Multi-Modal Social Media Data,” Applied Sciences (Switzerland), vol. 14, no. 24, Dec. 2024, doi: 10.3390/app142412007.

M. Dadvar and K. Eckert, “Cyberbullying Detection in Social Networks Using Deep Learning Based Models; A Reproducibility Study,” vol. 21, no. 6, Nov. 2024, doi: 10.34028/iajit/21/6/9.

D. Sultan, M. Mendes, A. Kassenkhan, and O. Akylbekov, “Hybrid CNN-LSTM Network for Cyberbullying Detection on Social Networks using Textual Contents,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 14, no. 9, 2023, doi: 10.14569/IJACSA.2023.0140978.

M. T. Hasan, M. A. E. Hossain, M. S. H. Mukta, A. Akter, M. Ahmed, and S. Islam, “A Review on Deep-Learning-Based Cyberbullying Detection,” MDPI journals, vol. 15, no. 5, May 2023, doi: 10.3390/fi15050179.

C. Emmery et al., “Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity,” Lang Resour Eval, vol. 55, no. 3, pp. 597–633, Sep. 2021, doi: 10.1007/s10579-020-09509-1.

L. Cheng, D. Hall, and H. Liu, “Session-based Cyberbullying Detection: Problems and Challenges,” IEEE Internet Comput, vol. 25, no. 2, Oct. 2020, doi: 10.1109/MIC.2020.3032930.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” Proceedings of the 28th International Conference on Computational Linguistics, pp. 757–770, Dec. 2020, doi: 10.18653/v1/2020.coling-main.66.

A. J. Andika, Y. Kristian, and E. I. Setiawan, “Detection of Cyberbullying Comments on Youtube Social Media Using Convolutional Neural Network – Long Short Term Memory Network (CNN-LSTM) Method,” Teknika, vol. 12, no. 3, pp. 183–188, Oct. 2023, doi: 10.34148/teknika.v12i3.677.

D. Y. Yefferson, V. Lawijaya, and A. S. Girsang, “Hybrid model: IndoBERT and long short- term memory for detecting Indonesian hoax news,” IAES International Journal of Artificial Intelligence, vol. 13, no. 2, pp. 1911–1922, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1913-1924.

I. A. Asqolani and E. B. Setiawan, “A Hybrid Deep Learning Approach Leveraging Word2Vec Feature Expansion for Cyberbullying Detection in Indonesian Twitter,” Ingenierie des Systemes d’Information, vol. 28, no. 4, pp. 887–895, Aug. 2023, doi: 10.18280/isi.280410.

A. Jalilifard, V. F. Caridá, A. F. Mansano, R. S. Cristo, and F. P. C. da Fonseca, “Semantic Sensitive TF-IDF to Determine Word Relevance in Documents,” Jan. 2020, doi: 10.1007/978- 981-33-6977-1.

W. Anggraeni, M. F. A. Kusuma, E. Riksakomara, R. P. Wibowo, Pujiadi, and S. Sumpeno, “Combination of BERT and Hybrid CNN-LSTM Models for Indonesia Dengue Tweets Classification,” International Journal of Intelligent Engineering and Systems, vol. 17, no. 1, pp. 813–826, 2024, doi: 10.22266/ijies2024.0229.68.

S. A. Sazan, M. H. Miraz, and A. B. M. Muntasir Rahman, “Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings,” Annals of Emerging Technologies in Computing, vol. 8, no. 3, pp. 34–49, Jul. 2024, doi: 10.33166/AETiC.2024.03.003.

D. Fabillah, R. Auliarahmi, S. D. Setiarini, and T. Gelar, “The Investigation of Convolution Layer Structure on BERT-C-LSTM for Topic Classification of Indonesian News Headlines,” Journal of Software Engineering, Information and Communication Technology (SEICT), vol. 4, no. 2, pp. 105–116, 2021, doi: 10.17509/seict.v4i2.63742.

F. Baharuddin and M. F. Naufal, “Fine-Tuning IndoBERT for Indonesian Exam Question Classification Based on Bloom’s Taxonomy,” Journal of Information Systems Engineering and Business Intelligence, vol. 9, no. 2, pp. 253–263, Oct. 2023, doi: 10.20473/jisebi.9.2.253-263.

Z. Ahanin, M. A. Ismail, N. S. S. Singh, and A. AL-Ashmori, “Hybrid Feature Extraction for Multi-Label Emotion Classification in English Text Messages,” Sustainability (Switzerland), vol. 15, no. 16, Aug. 2023, doi: 10.3390/su151612539.

H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” ILKOM Jurnal Ilmiah, vol. 14, no. 3, pp. 348–354, Dec. 2022, doi: 10.33096/ilkom.v14i3.1505.348-354.

D. A. Komara and A. Hadiapurwa, “AUTOMATING TWITTER DATA COLLECTION: A RAPIDMINER-BASED CRAWLING SOLUTION,” PUBLIS JOURNAL, vol. 6, Nov. 2022,

doi: 10.24269/pls.v6i2.6326.

A. Zhdanovskaya, D. Baidakova, and D. Ustalov Toloka, “Data Labeling for Machine Learning Engineers: Project-Based Curriculum and Data-Centric Competitions,” vol. 37, no. 13, 2023, doi: https://doi.org/10.1609/aaai.v37i13.26886.

D. Rifaldi, Abdul Fadlil, and Herman, “PREPROCESSING TECHNIQUES IN TEXT MINING: ‘MENTAL HEALTH’ TWEET DATA,” Decode: Jurnal Pendidikan Teknologi Informasi, vol. 3, no. 2, pp. 161–171, Apr. 2023, doi: 10.51454/decode.v3i2.131.

W. Yulita, M. C. Untoro, M. Praseptiawan, I. F. Ashari, A. Afriansyah, and A. N. Bin Che Pee, “Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity,” Scientific Journal of Informatics, vol. 10, no. 2, pp. 93–104, Apr. 2023, doi: 10.15294/sji.v10i2.42209.

S. D. Lestari and E. B. Setiawan, “Sentiment Analysis Based on Aspects Using FastText Feature Expansion and NBSVM Classification Method,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 469–477, Sep. 2022, doi: 10.47065/josyc.v3i4.2202.

A. Raihan and E. B. Setiawan, “Aspect Based Sentiment Analysis with FastText Feature Expansion and Support Vector Machine Method on Twitter,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 591–598, Aug. 2022, doi: 10.29207/resti.v6i4.4187.

M. A. S. Nasution and E. B. Setiawan, “Enhancing Cyberbullying Detection on Indonesian Twitter: Leveraging FastText for Feature Expansion and Hybrid Approach Applying CNN and BiLSTM,” Revue d’Intelligence Artificielle, vol. 37, no. 4, pp. 929–936, Aug. 2023, doi: 10.18280/ria.370413.

I. D. Mienye, T. G. Swart, and G. Obaido, “Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications,” Information, vol. 15, no. 9, p. 517, Aug. 2024, doi: 10.3390/info15090517.

A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network,” vol. 404, Mar. 2020, doi: 10.1016/j.physd.2019.132306.

H. Elzayady, K. M. Badran, and G. I. Salama, “Arabic Opinion Mining Using Combined CNN - LSTM Models,” International Journal of Intelligent Systems and Applications, vol. 12, no. 4,

pp. 25–36, Aug. 2020, doi: 10.5815/ijisa.2020.04.03.

Y. Widhiyasana, T. Semiawan, I. Gibran, A. Mudzakir, and M. R. Noor, “Convolutional Long Short-Term Memory Implementation for Indonesian News Classification,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi |, vol. 10, no. 4, pp. 354–361, Nov. 2021, doi: 10.22146/jnteti.v10i4.2438.

E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 92–97, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1985.

D. Widyawati, A. Faradibah, and P. L. L. Belluano, “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indonesian Journal of Data and Science, vol. 4, no. 2, pp. 78–87, Jul. 2023, doi: 10.56705/ijodas.v4i2.76.

Additional Files

Published

2025-04-26

How to Cite

[1]
A. A. . Hafiza and E. B. . Setiawan, “Enhancing Cyberbullying Detection on Platform ’X’ Using IndoBERT and Hybrid CNN-LSTM Model”, J. Tek. Inform. (JUTIF), vol. 6, no. 2, pp. 655–672, Apr. 2025.