Evaluating Ensemble Versus Non-Ensemble Machine Learning Performance with Preprocessing Techniques for IoT Intrusion Detection on CICIoT2023

Authors

  • Febrian Sabila Firdaus Informatic and Computer Engineering Education, Universitas Sebelas Maret, Indonesia
  • Puspanda Hatta Informatic and Computer Engineering Education, Universitas Sebelas Maret, Indonesia
  • Basori Informatic and Computer Engineering Education, Universitas Sebelas Maret, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.1.5408

Keywords:

CICIoT2023, Ensemble Learning, Feature Selection, Intrusion Detection, Machine Learning, Preprocessing

Abstract

The rapid expansion of the Internet of Things (IoT) introduces significant security vulnerabilities, exposing networks to sophisticated attacks. Developing effective Intrusion Detection Systems (IDS) is critical, yet many machine learning benchmarks rely on outdated datasets. This study provides a comprehensive comparative evaluation of ensemble and non-ensemble machine learning models for multiclass attack classification using the modern and complex CICIoT2023 dataset. The methodology involves robust preprocessing, including random undersampling to address extreme class imbalance and a hybrid feature selection approach combining Mutual Information (MI) and Random Forest Feature Importance (RFFI). Models, including Naive Bayes, Logistic Regression, SVM, Random Forest, and XGBoost, were evaluated using stratified 5-fold cross-validation (K=5) with default hyperparameters. The results demonstrate that ensemble models consistently and significantly outperform non-ensemble models. XGBoost achieved the highest and most stable performance, yielding a mean F1-score of 0.8889 ± 0.0008 across the K-folds, and a final macro F1-score of 0.8891 on the test set. This research confirms the superiority of ensemble methods for complex IoT traffic and quantitatively highlights the critical role of preprocessing. Notably, scaling was proven essential for non-ensemble models, drastically improving Logistic Regression's F1-score from an unstable 0.6280 to 0.7691.

Downloads

Download data is not yet available.

References

Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu Palaniswami, “Internet of Things (IoT): A vision, architectural elements, and future directions,” Future Generation Computer Systems, vol. 29, no. 7, 2013, doi: 10.1016/j.future.2013.01.010.

S. W. Mudjanarko, S. Winardi, and A. D. Limantara, “Pemanfaatan Internet of Things (IoT) sebagai solusi manajemen transportasi kendaraan sepeda motor,” Prosiding Seminar Nasional Aplikasi Teknologi Prasarana Wilayah X, 2017.

E. D. Meutia, “Internet of Things – Keamanan dan Privasi,” in Seminar Nasional dan Expo Teknik Elektro, vol. 1, no. 1, pp. 85–89, 2015.

Joseph Jose Anthraper, and J. Kotak, “Security, Privacy and Forensic Concern of MQTT Protocol,” Social Science Research Network, 2019, doi: 10.2139/ssrn.3355193.

Maad M. Mijwil, Omega John Unogwu, Y. Filali, I. Bala, and Humam Al-Shahwani, “Exploring the Top Five Evolving Threats in Cybersecurity: An In-Depth Overview,” Mesopotamian Journal of Cyber Security, 2023, doi: 10.58496/mjcs/2023/010.

Warsun Najib, Selo Sulistyo, and Widyawan, “Tinjauan Ancaman dan Solusi Keamanan pada Teknologi Internet of Things,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), vol. 9, no. 4, 2020, doi: 10.22146/jnteti.v9i4.539.

Jiamin Hu, and Xiaofan Yang, “A cost-effective adaptive repair strategy to mitigate DDoS-capable IoT botnets,” PLoS ONE, vol. 19, no. 12, 2024, doi: 10.1371/journal.pone.0301888.

Patrick Vanin et al., “A Study of Network Intrusion Detection Systems Using Artificial Intelligence/Machine Learning,” Applied Sciences, vol. 12, 2022, doi: 10.3390/app122211752.

Hongyu Liu, and Bo Lang, “Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey,” Applied Sciences, 2019, doi: 10.3390/app9204396.

Zeeshan Ahmad, A. Khan, W. Cheah, J. Abdullah, and Farhan Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,” Transactions on Emerging Telecommunications Technologies, vol. 32, 2020, doi: 10.1002/ett.4150.

Tamara A. Al-Shurbaji et al., “Deep Learning-Based Intrusion Detection System For Detecting IoT Botnet Attacks: A Review,” IEEE Access, 2025, doi: 10.1109/access.2025.3526711.

Yung-Chung Wang, Yi-Chun Houng, Han-Xuan Chen, and S. Tseng, “Network Anomaly Intrusion Detection Based on Deep Learning Approach,” Italian National Conference on Sensors, 2023, doi: 10.3390/s23042171.

Fatima Hussain, Rasheed Hussain, Syed Ali Hassan, and Ekram Hossain, “Machine Learning in IoT Security: Current Solutions and Future Challenges,” arXiv (Cornell University), 2019, doi: 10.48550/arxiv.1904.05735.

Won-Ju Eom, Yeong-Jun Song, Chang-Hoon Park, Jeong-Keun Kim, Geon-Hwan Kim, and You-Ze Cho, “Network Traffic Classification Using Ensemble Learning in Software-Defined Networks,” 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 2021, doi: 10.1109/icaiic51459.2021.9415187.

Edi Ismanto, Januar Al Amien, and Vitriani Vitriani, “A Comparison of Enhanced Ensemble Learning Techniques for Internet of Things Network Attack Detection,” Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer, 2024, doi: 10.30812/matrik.v23i3.3885.

Wengang Zhang, Chongzhi Wu, Haiyi Zhong, Yongqin Li, and Lin Wang, “Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization,” Geoscience frontiers, 2021, doi: 10.1016/j.gsf.2020.03.007.

Ziadoon Kamil Maseer, R. Yusof, N. Bahaman, S. Mostafa, and Cik Feresa Mohd Foozy, “Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset,” IEEE Access, vol. 9, 2021, doi: 10.1109/access.2021.3056614.

Yakub Kayode Saheed, Aremu Idris Abiodun, S. Misra, Monica Kristiansen Holone, and R. Colomo‐Palacios, “A machine learning-based intrusion detection for detecting internet of things network attacks,” Alexandria Engineering Journal, 2022, doi: 10.1016/j.aej.2022.02.063.

Raja Azlina Raja Mahmood, AmirHossien Abdi, and Masnida Hussin, “Performance Evaluation of Intrusion Detection System using Selected Features and Machine Learning Classifiers,” Baghdad Science Journal, 2021, doi: 10.21123/bsj.2021.18.2(suppl.).0884.

Jared M. Peterson, Joffrey L. Leevy, and T. Khoshgoftaar, “A Review and Analysis of the Bot-IoT Dataset,” International Symposium on Service Oriented Software Engineering, 2021, doi: 10.1109/sose52839.2021.00007.

S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Data Preprocessing for Supervised Leaning,”International Journal of Computer Science, 2007.

Sikha Bagui, and Kunqi Li, “Resampling imbalanced data for network intrusion detection datasets,” Journal of Big Data, 2021, doi: 10.1186/s40537-020-00390-x.

Fei Zhao, Jiyong Zhao, Xinxin Niu, Shoushan Luo, and Yang Xin, “A Filter Feature Selection Algorithm Based on Mutual Information for Intrusion Detection,” Applied Sciences, 2018, doi: 10.3390/app8091535.

Achmad Akbar Megantara, and Tohari Ahmad, “A hybrid machine learning method for increasing the performance of network intrusion detection systems,” Journal of Big Data, vol. 8, 2021, doi: 10.1186/s40537-021-00531-w.

E. P. Neto, Sajjad Dadkhah, Raphael Ferreira, Alireza Zohourian, Rongxing Lu, and A. Ghorbani, “CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment,” Italian National Conference on Sensors, 2023, doi: 10.3390/s23135941.

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” arXiv: Artificial Intelligence, 2011, doi: 10.1613/jair.953.

Hanchuan Peng, Fuhui Long, and Chris Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, doi: 10.1109/tpami.2005.159.

Aulia Arif Wardana, Parman Sukarno, Setio Basuki, and Sasmito Budi Utomo, “Federated Random Forest with Feature Selection for Collaborative Intrusion Detection in Internet of Things,” Procedia Computer Science, 2024, doi: 10.1016/j.procs.2024.09.193.

Andrew Churcher et al., “An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks.,” arXiv: Cryptography and Security, 2021, doi: 10.3390/s21020446.

Jadel Alsamiri, and Khalid Alsubhi, “Internet of Things Cyber Attacks Detection using Machine Learning,” International Journal of Advanced Computer Science and Applications, 2019, doi: 10.14569/ijacsa.2019.0101280.

Y. LeCun, Y. Bengio, and M. A. Arbib, The Handbook of Brain Theory and Neural Networks. 1998.

Gavin Brown, “Ensemble Learning,” Encyclopedia of Machine Learning, 2011, doi: 10.1007/978-0-387-30164-8_252.

Chen Chen et al., “Application of GA-WELM Model Based on Stratified Cross-Validation in Intrusion Detection,” Symmetry, 2023, doi: 10.3390/sym15091719.

Andrew P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognition, 1997, doi: 10.1016/s0031-3203(96)00142-2.

Additional Files

Published

2026-02-15

How to Cite

[1]
F. S. Firdaus, P. Hatta, and B. Basori, “Evaluating Ensemble Versus Non-Ensemble Machine Learning Performance with Preprocessing Techniques for IoT Intrusion Detection on CICIoT2023”, J. Tek. Inform. (JUTIF), vol. 7, no. 1, pp. 606–618, Feb. 2026.