Classification of Eyewitness Social Media Messages for Natural Disaster Monitoring using BERT Variants
DOI:
https://doi.org/10.52436/1.jutif.2026.7.3.5317Keywords:
BERT, Classification, Masked Language Modeling, Transformer, Twitter/XAbstract
The rapid growth of disaster-related social media data demands effective monitoring. However, its real-time source presents challenges due to large volumes of unstructured and noisy data. This study aims to improve effective monitoring with BERT variants to classify eyewitness reports on Twitter/X. Earlier studies have applied machine-learning and deep-learning models to automate the monitoring of eyewitness messages on social media, but these models still have shortcomings. Traditional machine-learning models rely on handcrafted and frequency-based features, limiting their ability to capture contextual semantics. Deep-learning models offer improved performance but still face challenges in modeling long-range dependencies and handling high-volume social media streams. This issue is pronounced in social media streams. This study employs transformer-based models using several BERT variants (BERT, RoBERTa, DistilBERT, ELECTRA, and ALBERT). Each model is pre-trained with the Masked Language Modeling (MLM) objective, and batch-size optimization is applied to boost performance. Experimental results indicate that a batch size of 16 consistently yields the best performance, with the standard BERT model achieving the highest macro-F1 score of 0.762. By disaster type, macro-F1 scores reach 0.744 for hurricane, 0.793 for flood, 0.756 for earthquake, and 0.750 for wildfire. BERT (16) outperforms the other BERT variants and twelve baseline models from prior research. Unlike previous approaches, this study leverages pre-trained Masked Language Models to optimize classification on disaster-related datasets. The findings contribute to the development of transformer-based architectures for text classification in real-time disaster informatics, leading to more accurate situational awareness and reduced delays in emergency decision-making.
Downloads
References
V. M. Cvetković, R. Renner, B. Aleksova, and T. Lukić, “Geospatial and Temporal Patterns of Natural and Man-Made (Technological) Disasters (1900–2024): Insights from Different Socio-Economic and Demographic Perspectives,” Applied Sciences, vol. 14, no. 18, 2024, doi: 10.3390/app14188129.
Y. M. Balakrishna and V. Shivashetty, “Device-to-device based path selection for post disaster communication using hybrid intelligence,” International Journal of Electrical and Computer Engineering, vol. 14, no. 1, pp. 796–810, Feb. 2024, doi: 10.11591/ijece.v14i1.pp796-810.
R. Efendi and I. R. Widiasari, “Precipitation and water discharge for internet of things based flood disaster prediction improvement,” International Journal of Electrical and Computer Engineering, vol. 14, no. 6, pp. 6773–6785, Dec. 2024, doi: 10.11591/ijece.v14i6.pp6773-6785.
D. Priyanto, M. Zarlis, H. Mawengkang, and S. Efendi, “Analysis of earthquake hazards prediction with multivariate adaptive regression splines,” International Journal of Electrical and Computer Engineering, vol. 12, no. 3, pp. 2885–2893, Jun. 2022, doi: 10.11591/ijece.v12i3.pp2885-2893.
W. J. Ripple et al., “The 2024 state of the climate report: Perilous times on planet Earth,” Bioscience, vol. 74, no. 12, pp. 812–824, Dec. 2024, doi: 10.1093/biosci/biae087.
G. Airlangga, “Comparative Analysis of Machine Learning Models for Real-Time Disaster Tweet Classification: Enhancing Emergency Response with Social Media Analytics,” Brilliance: Research of Artificial Intelligence, vol. 4, no. 1, pp. 25–31, 2024.
K. Zahra, M. Imran, and F. O. Ostermann, “Automatic identification of eyewitness messages on twitter during disasters,” Inf Process Manag, vol. 57, no. 1, p. 102107, Jan. 2020, doi: 10.1016/J.IPM.2019.102107.
N. Indrani et al., “Classification of Natural Disaster Reports from Social Media using K-Means SMOTE and Multinomial Naïve Bayes,” J-COSINE (Journal of Computer Science and Informatics Engineering), vol. 7, no. ., pp. 60–67, Jun. 2023.
S. Nazir, M. Asif, S. Ahmad, H. Aljuaid, Y. Ghadi, and Z. Nawaz, “Automatic Eyewitness Identification During Disasters by Forming a Feature-Word Dictionary,” Computers, Materials & Continua, vol. 72, pp. 4755–4769, Nov. 2022, doi: 10.32604/cmc.2022.026145.
S. Haider, M. Azhar, S. Khatoon, M. Alshamari, and M. Afzal, “Automatic Classification of Eyewitness Messages for Disaster Events Using Linguistic Rules and ML/AI Approaches,” Applied Sciences, vol. 12, pp. 1–17, Oct. 2022, doi: 10.3390/app12199953.
I. Budiman, M. R. Faisal, F. Abadi, D. Nugrahadi, and M. Haekal, “A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification,” Journal of Computer Sciences Institute, pp. 145–153, Jun. 2023, doi: 10.35784/jcsi.3322.
A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaševičius, “Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review,” Algorithms, vol. 16, no. 5, 2023, doi: 10.3390/a16050236.
Muhammad Zulqarnain et al., “Text Classification Using Deep Learning Models: A Comparative Review,” Cloud Computing and Data Science, pp. 80–96, Oct. 2023, doi: 10.37256/ccds.5120243528.
S. Tabinda Kokab, S. Asghar, and S. Naz, “Transformer-based deep learning models for the sentiment analysis of social media data,” Array, vol. 14, p. 100157, 2022, doi: https://doi.org/10.1016/j.array.2022.100157.
N. Patwardhan, S. Marrone, and C. Sansone, “Transformers in the Real World: A Survey on NLP Applications,” Information, vol. 14, no. 4, 2023, doi: 10.3390/info14040242.
A. Balagopalan, B. Eyre, F. Rudzicz, and J. Novikova, “To BERT or not to BERT: comparing speech and language-based approaches for Alzheimer’s disease detection,” arXiv preprint arXiv:2008.01551, 2020.
H. S. Alatawi, A. M. Alhothali, and K. M. Moria, “Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding With Deep Learning and BERT,” IEEE Access, vol. 9, pp. 106363–106374, 2021, doi: 10.1109/ACCESS.2021.3100435.
P. Ganesh et al., “Compressing Large-Scale Transformer-Based Models: A Case Study on BERT,” Trans Assoc Comput Linguist, vol. 9, pp. 1061–1080, Sep. 2021, doi: 10.1162/tacl_a_00413.
A. Wettig, T. Gao, Z. Zhong, and D. Chen, “Should you mask 15% in masked language modeling?,” arXiv preprint arXiv:2202.08005, 2022.
M. Zhao, T. Lin, F. Mi, M. Jaggi, and H. Schütze, “Masking as an efficient alternative to finetuning for pretrained language models,” arXiv preprint arXiv:2004.12406, 2020.
M. Weyssow, H. Sahraoui, and E. Syriani, “Recommending metamodel concepts during modeling activities with pre-trained language models,” Softw Syst Model, vol. 21, Dec. 2022, doi: 10.1007/s10270-022-00975-5.
J. Briskilal and C. N. Subalalitha, “An ensemble model for classifying idioms and literal texts using BERT and RoBERTa,” Inf Process Manag, vol. 59, no. 1, p. 102756, 2022, doi: https://doi.org/10.1016/j.ipm.2021.102756.
B. Büyüköz, A. Hürriyetoğlu, and A. Özgür, “Analyzing ELMo and DistilBERT on Socio-political News Classification,” in Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, A. Hürriyetoğlu, E. Yörük, V. Zavarella, and H. Tanev, Eds., Marseille, France: European Language Resources Association (ELRA), May 2020, pp. 9–18. [Online]. Available: https://aclanthology.org/2020.aespen-1.4
K. Clark, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.
Petr Zelina, “Pretraining and Evaluation of Czech ALBERT Language Model,” Masaryk University, Brno, 2020.
V. R. Joseph, “Optimal Ratio for Data Splitting,” Feb. 2022, doi: 10.1002/sam.11583.
M. Naseer, M. Asvial, and R. F. Sari, “An Empirical Comparison of BERT, RoBERTa, and Electra for Fact Verification,” in 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 2021, pp. 241–246. doi: 10.1109/ICAIIC51459.2021.9415192.
C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to Fine-Tune BERT for Text Classification?,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.05583
D. Krstinic, M. Braović, L. Šerić, and D. Božić-Štulić, Multi-label Classifier Performance Evaluation with Confusion Matrix. 2020. doi: 10.5121/csit.2020.100801.
K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F1 and macro-averaged F1 scores,” Applied Intelligence, vol. 52, no. 5, pp. 4961–4972, 2022, doi: 10.1007/s10489-021-02635-5.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Bashir Hanafi, Mohammad Reza Faisal, Friska Abadi, Irwan Budiman, Setyo Wahyu Saputro, Njideka Nkemdilim Mbeledogu

This work is licensed under a Creative Commons Attribution 4.0 International License.





