TEXT CLASSIFICATION OF BULLYING REPORTS USING NLP AND RANDOM FOREST.

  • Dasril Aldo Informatics Engineering, Faculty of Informatics, Telkom University, Indonesia
  • Adanti Wido Paramadini Biomedical Engineering, Faculty of Telecommunications and Electronics Engineering, Telkom University, Indonesia
  • M. Yoka Fathoni Information Technology, University Kuala Lumpur, Malaysia
Keywords: Bag of Words, Bullying, Natural Language Processing, Random Forest, Text Classification

Abstract

Bullying is a great concern that needs to be dealt with as early as possible, be it in the form of physical, verbal, social or cyber bullying. Using NLP algorithms, this paper intends to classify bullying report using Natural Language Processing in conjunction with Bag of Words. The study employs quantitative methodology. A total of 4671 reports of bullying are in essence categorized into physical, verbal, social, cyber and non-cyber bullying. We split the dataset into 80% training set (3737 reports) and 20% testing set (934 reports). The above model has achieved an accuracy of 94,76%, with good values of recall, precision and F1-score: 94,64%, 95,02% and 94,97% respectively. The dataset is then analyzed using Random Forest algorithm and Report of the Bullying Survey The model is to be effective in automatic Detection of Textual Bullying Reports Automated. While there has been no such effort in our institutions so far, automatic reporting of bullying will prove to be effective. This is because the system will allow a school or institution to have a precise constant monitoring of bullying reports. It will also allow an instantaneous action to be taken to protect the victim without letting the situation escalate.

Downloads

Download data is not yet available.

References

H. Torkos dan A. Anwer, “INVOLVEMENT OF TEENAGERS IN THE BEHAVIOR OF BULLYING & CYBER VIOLENCE,” J. PLUS Educ., vol. 31, no. 2/2022, hlm. 72–82, Nov 2022, doi: 10.24250/JPE/2/2022/HT/AA.

M. F. H. Bornes, D. Terceiro, F. V. Peña, N. Burdisso, dan S. Terrasa, “Proposal for a short version of an international questionnaire to detect bullying: adaptation to Argentine Spanish language and exploratory factor analysis,” Arch. Argent. Pediatr., vol. 120, no. 1, Feb 2022, doi: 10.5546/aap.2022.eng.14.

H. Korzhov dan M. Yenin, “Sociological dimensions of cyberbullying: essence, consequences, and coping strategies,” Sociol. Theory Methods Mark., no. 4, hlm. 103–120, 2022, doi: 10.15407/sociology2022.04.103.

V. Veronica, “Bullying in School-Age Children,” Sci. Psychiatr., vol. 3, no. 2, hlm. 198–206, Apr 2022, doi: 10.37275/scipsy.v5i1.136.

F. A. Esquivel, I. L. D. L. G. López, dan A. D. Benavides, “Emotional impact of bullying and cyber bullying: perceptions and effects on students,” Rev. Caribeña Cienc. Soc., vol. 12, no. 1, hlm. 367–383, Jun 2023, doi: 10.55905/rcssv12n1-022.

W. Qian, S. Yu, Z. Nie, X. S. Lu, H. Liu, dan B. Huang, “Improved Hierarchical Attention Networks for Cyberbullying Detection via Social Media Data,” dalam 2023 IEEE International Conference on Networking, Sensing and Control (ICNSC), Marseille, France: IEEE, Okt 2023, hlm. 1–6. doi: 10.1109/ICNSC58704.2023.10319023.

O. Kjell, S. Giorgi, dan H. A. Schwartz, “The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers.,” Psychol. Methods, vol. 28, no. 6, hlm. 1478–1498, Des 2023, doi: 10.1037/met0000542.

N. Alinda Rahmi and R. Wulan Dari, “IMPLEMENTATION OF NATURAL LANGUAGE PROCESSING (NLP) IN CONSUMER SENTIMENT ANALYSIS OF PRODUCT COMMENTS ON THE MARKETPLACE”, J. Tek. Inform. (JUTIF), vol. 5, no. 3, pp. 693-701, May 2024.

M. Muffo, R. Tedesco, L. Sbattella, dan V. Scotti, “Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm,” 2023, doi: 10.48550/ARXIV.2304.03098.

B. Probierz, A. Hrabia, dan J. Kozak, “A New Method for Graph-Based Representation of Text in Natural Language Processing,” Electronics, vol. 12, no. 13, hlm. 2846, Jun 2023, doi: 10.3390/electronics12132846.

D. H. Lubis, S. Sawaluddin, dan A. Candra, “Machine Learning Model for Language Classification: Bag-of-words and Multilayer Perceptron,” J. Inform. Telecommun. Eng., vol. 7, no. 1, hlm. 356–365, Jul 2023, doi: 10.31289/jite.v7i1.10114.

P. Dedeepya, P. Sowmya, T. D. Saketh, P. Sruthi, P. Abhijit, dan S. P. Praveen, “Detecting Cyber Bullying on Twitter using Support Vector Machine,” dalam 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India: IEEE, Feb 2023, hlm. 817–822. doi: 10.1109/ICAIS56108.2023.10073658.

Chingmuankim dan R. Jindal, “Classification and Analysis of Textual data using Naive Bayes with TF-IDF,” dalam 2022 4th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), KualaLumpur, Malaysia: IEEE, Nov 2022, hlm. 1–9. doi: 10.1109/ICECIE55199.2022.10000309.

S. M. Fati, A. Muneer, A. Alwadain, dan A. O. Balogun, “Cyberbullying Detection on Twitter Using Deep Learning-Based Attention Mechanisms and Continuous Bag of Words Feature Extraction,” Mathematics, vol. 11, no. 16, hlm. 3567, Agu 2023, doi: 10.3390/math11163567.

A. Alabdulwahab, M. A. Haq, dan M. Alshehri, “Cyberbullying Detection using Machine Learning and Deep Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 10, 2023, doi: 10.14569/IJACSA.2023.0141045.

P. Shiguihara dan L. Berton, “Exploring Deep Neural Networks and Decision Tree for Spanish Text Classification,” dalam 2022 IEEE XXIX International Conference on Electronics, Electrical Engineering and Computing (INTERCON), Lima, Peru: IEEE, Agu 2022, hlm. 1–4. doi: 10.1109/INTERCON55795.2022.9870087.

Mr Adepu Rajesh dan Dr Tryambak Hiwarkar, “Exploring Preprocessing Techniques for Natural LanguageText: A Comprehensive Study Using Python Code,” Int. J. Eng. Technol. Manag. Sci., vol. 7, no. 5, hlm. 390–399, 2023, doi: 10.46647/ijetms.2023.v07i05.047.

X. Fu, Y. Chen, J. Yan, Y. Chen, dan F. Xu, “BGRF: A broad granular random forest algorithm,” J. Intell. Fuzzy Syst., vol. 44, no. 5, hlm. 8103–8117, Mei 2023, doi: 10.3233/JIFS-223960.

K. Dedja, F. K. Nakano, K. Pliakos, dan C. Vens, “BELLATREX: Building Explanations through a LocaLly AccuraTe Rule EXtractor,” 2022, arXiv. doi: 10.48550/ARXIV.2203.15511.

I. Tarchoune, A. Djebbar, dan H. F. Merouani, “Improving Random Forest with Pre-pruning technique for Binary classification,” Sci. Abstr., vol. 1, no. 2, hlm. 11, Jul 2023, doi: 10.59287/as-abstracts.1202.

W. Gao, F. Xu, dan Z.-H. Zhou, “Towards convergence rate analysis of random forests for classification,” Artif. Intell., vol. 313, hlm. 103788, Des 2022, doi: 10.1016/j.artint.2022.103788.

S. Riyanto, I. S. Sitanggang, T. Djatna, dan T. D. Atikah, “Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023, doi: 10.14569/IJACSA.2023.01406116.

Published
2025-02-10
How to Cite
[1]
D. Aldo, A. W. Paramadini, and M. Y. Fathoni, “TEXT CLASSIFICATION OF BULLYING REPORTS USING NLP AND RANDOM FOREST.”, J. Tek. Inform. (JUTIF), vol. 6, no. 1, pp. 23-30, Feb. 2025.