Enhancing Fake News Detection on Imbalanced Data Using Resampling Techniques and Classical Machine Learning Models

Dodo Zaenal  Abidin; Agus  Siswanto; Chindra  Saputra; Betantiyo  Betantiyo; Afrizal  Nehemia Toscany

doi:10.52436/1.jutif.2025.6.5.5177

Authors

Dodo Zaenal Abidin Magister of Information System, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Agus Siswanto Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Chindra Saputra Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Betantiyo Magister of Information System, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Afrizal Nehemia Toscany Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.5177

Keywords:

class imbalance, fake news classification, logistic regression, resampling techniques, random forest, support vector machine

Abstract

Class imbalance remains a critical challenge in fake news detection, particularly in domains such as entertainment media where class distributions are highly skewed. This study evaluates seven resampling techniques—Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek Links, NearMiss, and No Resampling—applied to three classical machine learning models: Logistic Regression, Support Vector Machine (SVM), and Random Forest. Using the imbalanced GossipCop dataset comprising 24,102 news headlines, the proposed pipeline integrates TF-IDF vectorization, stratified 3-fold cross-validation, and five evaluation metrics: F1-score, precision, recall, ROC AUC, and PR AUC. Experimental results show that oversampling methods, particularly SMOTE and Random Oversampling, substantially improve minority class (fake news) detection. Among all model–resampling combinations, SVM with SMOTE achieved the highest performance (F1-score = 0.67, PR AUC = 0.74), demonstrating its robustness in handling imbalanced short-text classification. Conversely, undersampling methods frequently reduced recall, especially with ensemble models like Random Forest. This approach enhances model robustness in fake news detection on skewed datasets and contributes a reproducible, domain-specific framework for developing more reliable misinformation classifiers.

Downloads

Download data is not yet available.

References

D. H. Lan and T. M. Tung, “Exploring fake news awareness and trust in the age of social media among university student tiktok users,” Cogent Soc. Sci., vol. 10, no. 1, Dec. 2024, doi: 10.1080/23311886.2024.2302216.

M. A. Alonso, D. Vilares, C. Gómez-Rodríguez, and J. Vilares, “Sentiment analysis for fake news detection,” Electronics, vol. 10, no. 11, p. 1348, Jun. 2021, doi: 10.3390/electronics10111348.

B. Collins, D. T. Hoang, N. T. Nguyen, and D. Hwang, “Trends in combating fake news on social media – a survey,” J. Inf. Telecommun., vol. 5, no. 2, pp. 247–266, Apr. 2021, doi: 10.1080/24751839.2020.1847379.

S. Mishra, P. Shukla, and R. Agarwal, “Analyzing machine learning enabled fake news detection techniques for diversified datasets,” Wirel. Commun. Mob. Comput., vol. 2022, pp. 1–18, Mar. 2022, doi: 10.1155/2022/1575365.

M. F. Mridha, A. J. Keya, Md. A. Hamid, M. M. Monowar, and Md. S. Rahman, “A comprehensive review on fake news detection with deep learning,” IEEE Access, vol. 9, pp. 156151–156170, 2021, doi: 10.1109/access.2021.3129329.

S. K. Hamed, M. J. Ab Aziz, and M. R. Yaakub, “A review of fake news detection approaches: a critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion,” Heliyon, vol. 9, no. 10, p. e20382, Oct. 2023, doi: 10.1016/j.heliyon.2023.e20382.

F. Gulzar Hussain, M. Wasim, S. Hameed, A. Rehman, M. Nabeel Asim, and A. Dengel, “Fake news detection landscape: datasets, data modalities, ai approaches, their challenges, and future perspectives,” IEEE Access, vol. 13, pp. 54757–54778, 2025, doi: 10.1109/access.2025.3553909.

Q. Li, C. Zhao, X. He, K. Chen, and R. Wang, “The impact of partial balance of imbalanced dataset on classification performance,” Electronics, vol. 11, no. 9, p. 1322, Apr. 2022, doi: 10.3390/electronics11091322.

M. M. Hossain, Z. Awosaf, M. S. H. Prottoy, A. S. M. Alvy, and M. K. Morol, “Approaches for improving the performance of fake news detection in bangla: imbalance handling and model stacking,” Mar. 22, 2022, arXiv: arXiv:2203.11486. doi: 10.48550/arXiv.2203.11486.

J. Yao, Y. Zheng, and H. Jiang, “An ensemble model for fake online review detection based on data resampling, feature pruning, and parameter optimization,” IEEE Access, vol. 9, pp. 16914–16927, 2021, doi: 10.1109/access.2021.3051174.

G. S. Budhi, R. Chiong, and Z. Wang, “Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features,” Multimed. Tools Appl., vol. 80, no. 9, pp. 13079–13097, Apr. 2021, doi: 10.1007/s11042-020-10299-5.

J. Y. Khan, Md. T. I. Khondaker, S. Afroz, G. Uddin, and A. Iqbal, “A benchmark study of machine learning models for online fake news detection,” Mach. Learn. Appl., vol. 4, p. 100032, Jun. 2021, doi: 10.1016/j.mlwa.2021.100032.

E. Elsaeed, O. Ouda, M. M. Elmogy, A. Atwan, and E. El-Daydamony, “Detecting fake news in social media using voting classifier,” IEEE Access, vol. 9, pp. 161909–161925, 2021, doi: 10.1109/access.2021.3132022.

M. S. Kraiem, F. Sánchez-Hernández, and M. N. Moreno-García, “Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties. an approach based on association models,” Appl. Sci., vol. 11, no. 18, p. 8546, Sep. 2021, doi: 10.3390/app11188546.

D. Z. Abidin, M. Rosario, and A. Sadikin, “Improving term deposit customer prediction using support vector machine with smote and hyperparameter tuning in bank marketing campaigns,” vol. 6, no. 3, 2025, doi: doi.org/10.52436/1.jutif.2025.6.3.4585.

M. Khushi et al., “A comparative performance analysis of data resampling methods on imbalance medical data,” IEEE Access, vol. 9, pp. 109960–109975, 2021, doi: 10.1109/access.2021.3102399.

G. S. Budhi, R. Chiong, and Z. Wang, “Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features,” Multimed. Tools Appl., vol. 80, no. 9, pp. 13079–13097, Apr. 2021, doi: 10.1007/s11042-020-10299-5.

E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, and B. Peters, “The receiver operating characteristic curve accurately assesses imbalanced datasets,” Patterns, vol. 5, no. 6, p. 100994, Jun. 2024, doi: 10.1016/j.patter.2024.100994.

X. Chao, G. Kou, Y. Peng, and A. Fernández, “An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: experimental analysis,” Inf. Sci., vol. 608, pp. 1131–1156, Aug. 2022, doi: 10.1016/j.ins.2022.06.045.

C.-M. Lai, M.-H. Chen, E. Kristiani, V. K. Verma, and C.-T. Yang, “Fake news classification based on content level features,” Appl. Sci., vol. 12, no. 3, p. 1116, Jan. 2022, doi: 10.3390/app12031116.

S. Farhadpour, T. A. Warner, and A. E. Maxwell, “Selecting and interpreting multiclass loss and accuracy assessment metrics for classifications with class imbalance: guidance and best practices”, doi: doi.org/ 10.3390/rs16030533.

K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “Fakenewsnet: a data repository with news content, social context and spatialtemporal information for studying fake news on social media,” Mar. 27, 2019, arXiv: arXiv:1809.01286. doi: 10.48550/arXiv.1809.01286.

K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news detection on social media: a data mining perspective,” Sep. 03, 2017, arXiv: arXiv:1708.01967. doi: 10.48550/arXiv.1708.01967.

L. Wang, M. Han, X. Li, N. Zhang, and H. Cheng, “Review of classification methods on unbalanced data sets,” IEEE Access, vol. 9, pp. 64606–64628, 2021, doi: 10.1109/access.2021.3074243.

S. Rawat, A. Rawat, D. Kumar, and A. S. Sabitha, “Application of machine learning and data visualization techniques for decision support in the insurance sector,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100012, Nov. 2021, doi: 10.1016/j.jjimei.2021.100012.

F. Olan, U. Jayawickrama, E. O. Arakpogun, J. Suklan, and S. Liu, “Fake news on social media: the impact on society,” Inf. Syst. Front., vol. 26, no. 2, pp. 443–458, Apr. 2024, doi: 10.1007/s10796-022-10242-z.

S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, “An ensemble machine learning approach through effective feature extraction to classify fake news,” Future Gener. Comput. Syst., vol. 117, pp. 47–58, Apr. 2021, doi: 10.1016/j.future.2020.11.022.

A. M. Elmogy, U. Tariq, A. Ibrahim, and A. Mohammed, “Fake reviews detection using supervised machine learning,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 1, 2021.

M. Z. Naeem, F. Rustam, A. Mehmood, Mui-zzud-din, I. Ashraf, and G. S. Choi, “Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms,” PeerJ Comput. Sci., vol. 8, p. e914, Mar. 2022, doi: 10.7717/peerj-cs.914.

R. M. Pereira, Y. M. G. Costa, and C. N. Silla Jr., “Toward hierarchical classification of imbalanced data using random resampling algorithms,” Inf. Sci., vol. 578, pp. 344–363, Nov. 2021, doi: 10.1016/j.ins.2021.07.033.

M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive analysis of random forest and xgboost performance with smote, adasyn, and gnus under varying imbalance levels,” Technologies, vol. 13, no. 3, p. 88, Feb. 2025, doi: 10.3390/technologies13030088.

M. Altalhan, A. Algarni, and M. Turki-Hadj Alouane, “Imbalanced data problem in machine learning: a review,” IEEE Access, vol. 13, pp. 13686–13699, 2025, doi: 10.1109/access.2025.3531662.

M. S. Ebrahimi Shahabadi, H. Tabrizchi, M. Kuchaki Rafsanjani, B. B. Gupta, and F. Palmieri, “A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems,” Technol. Forecast. Soc. Change, vol. 169, p. 120796, Aug. 2021, doi: 10.1016/j.techfore.2021.120796.

A. Mahabub, “A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers,” SN Appl. Sci., vol. 2, no. 4, Apr. 2020, doi: 10.1007/s42452-020-2326-y.

M. Thanh Vo, A. H. Vo, T. Nguyen, R. Sharma, and T. Le, “Dealing with the class imbalance problem in the detection of fake job descriptions,” Comput. Mater. Contin., vol. 68, no. 1, pp. 521–535, 2021, doi: 10.32604/cmc.2021.015645.

M. Carvalho, A. J. Pinho, and S. Brás, “Resampling approaches to handle class imbalance: a review from a data perspective,” J. Big Data, vol. 12, no. 1, Mar. 2025, doi: 10.1186/s40537-025-01119-4.

I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, “Fake news detection using machine learning ensemble methods,” Complexity, vol. 2020, pp. 1–11, Oct. 2020, doi: 10.1155/2020/8885861.

S. Kaur, P. Kumar, and P. Kumaraguru, “Automating fake news detection system using multi-level voting model,” Soft Comput., vol. 24, no. 12, pp. 9049–9069, Jun. 2020, doi: 10.1007/s00500-019-04436-y.

T. Jiang, J. P. Li, A. U. Haq, A. Saboor, and A. Ali, “A novel stacking approach for accurate detection of fake news,” IEEE Access, vol. 9, pp. 22626–22639, 2021, doi: 10.1109/access.2021.3056079.

C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification,” Remote Sens., vol. 11, no. 2, p. 185, Jan. 2019, doi: 10.3390/rs11020185.

W. H. Bangyal et al., “Detection of fake news text classification on covid-19 using deep learning approaches,” Comput. Math. Methods Med., vol. 2021, pp. 1–14, Nov. 2021, doi: 10.1155/2021/5514220.

J. Zhou, A. H. Gandomi, F. Chen, and A. Holzinger, “Evaluating the quality of machine learning explanations: a survey on methods and metrics,” Electronics, vol. 10, no. 5, p. 593, Mar. 2021, doi: 10.3390/electronics10050593.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating trust prediction and confusion matrix measures for web services ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/access.2020.2994222.

M. N. Razali, S. A. Manaf, R. B. Hanapi, M. R. Salji, L. W. Chiat, and K. Nisar, “Enhancing minority sentiment classification in gastronomy tourism: a hybrid sentiment analysis framework with data augmentation, feature engineering and business intelligence,” IEEE Access, vol. 12, pp. 49387–49407, 2024, doi: 10.1109/access.2024.3362730.

Q. Li et al., “A survey on text classification: from traditional to deep learning,” ACM Trans. Intell. Syst. Technol., vol. 13, no. 2, pp. 1–41, Apr. 2022, doi: 10.1145/3495162.

Enhancing Fake News Detection on Imbalanced Data Using Resampling Techniques and Classical Machine Learning Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

sidebar

Information