Stacked Random Forest-LightGBM for Web Attack Classification
DOI:
https://doi.org/10.52436/1.jutif.2025.6.5.4950Keywords:
LightGBM, Random Forest, SMOTE, Stacking Hybrid, Web Attack ClassificationAbstract
The rapid expansion of web services in the digital era has intensified exposure to increasingly complex and imbalanced cyber threats. This study proposes a stacking hybrid ensemble framework for web attack classification, integrating Random Forest as the base learner and LightGBM as the meta-learner, enhanced by the SMOTE technique for data balancing. The Web Attack subset of the CICIDS-2017 dataset serves as a case study, with a focus on detecting minority attacks such as SQL Injection, XSS, and Brute Force. The preprocessing pipeline includes data cleaning, removal of irrelevant features, normalization, extreme value imputation, and ANOVA F-test-based feature selection. Evaluation results indicate that the proposed model outperforms baseline models in both multiclass classification (98.7% accuracy, 0.634 macro F1-score) and binary classification (99.41% accuracy, 99.47% F1-score), while maintaining high sensitivity to minority classes. These results contribute to informatics and cybersecurity scholarship through a generalizable stacking baseline and well-specified evaluation procedures for web-attack detection, facilitating replicability, fair comparison, and dataset-agnostic insights.
Downloads
References
S. S. Nair, “Securing Against Advanced Cyber Threats: A Comprehensive Guide to Phishing, XSS, and SQL Injection Defense,” Journal of Computer Science and Technology Studies, vol. 6, no. 1, pp. 76–93, Jan. 2024, doi: 10.32996/jcsts.2024.6.1.9.
S. M. Sohi, J. P. Seifert, and F. Ganji, “RNNIDS: Enhancing Network Intrusion Detection Systems through Deep Learning,” Comput Secur, vol. 102, p. 102151, Mar. 2021, doi: 10.1016/j.cose.2020.102151.
A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, p. 20, 2019, doi: 10.1186/s42400-019-0038-7.
M. Agoramoorthy, A. Ali, D. Sujatha, M. Raj. T. F, and G. Ramesh, “An Analysis of Signature-Based Components in Hybrid Intrusion Detection Systems,” in 2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS), IEEE, Dec. 2023, pp. 1–5. doi: 10.1109/ICCEBS58601.2023.10449209.
S. Sankaranarayanan, A. T. Sivachandran, A. S. M. Khairuddin, K. Hasikin, and A. R. W. Sait, “An ensemble classification method based on machine learning models for malicious Uniform Resource Locators (URL),” PLoS One, vol. 19, no. 5, p. e0302196, May 2024, doi: 10.1371/journal.pone.0302196.
Z. Yang et al., “A systematic literature review of methods and datasets for anomaly-based network intrusion detection,” Comput Secur, vol. 116, p. 102675, May 2022, doi: 10.1016/j.cose.2022.102675.
B. A. Tama, L. Nkenyereye, S. M. R. Islam, and K. S. Kwak, “An Enhanced Anomaly Detection in Web Traffic Using a Stack of Classifier Ensemble,” IEEE Access, vol. 8, pp. 24120–24134, 2020, doi: 10.1109/ACCESS.2020.2969428.
R. Zuech, J. Hancock, and T. M. Khoshgoftaar, “Detecting web attacks using random undersampling and ensemble learners,” J Big Data, vol. 8, no. 1, p. 75, Dec. 2021, doi: 10.1186/s40537-021-00460-8.
J. Liu, Y. Gao, and F. Hu, “A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM,” Comput Secur, vol. 106, p. 102289, Jul. 2021, doi: 10.1016/j.cose.2021.102289.
C. Tang, N. Luktarhan, and Y. Zhao, “An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder,” Symmetry (Basel), vol. 12, no. 9, p. 1458, Sep. 2020, doi: 10.3390/sym12091458.
E. Mushtaq, A. Zameer, and A. Khan, “A Two-Stage Stacked Ensemble Intrusion Detection System using Five Base Classifiers and MLP with Optimal Feature Selection,” Microprocess Microsyst, vol. 94, p. 104660, Oct. 2022, doi: 10.1016/j.micpro.2022.104660.
F. D. Hafriadi and R. Ardiansyah, “Networks’s Access Log Classification for Detecting SQL Injection Attacks with the LSTM Algorithm,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 745–752, Sep. 2024, doi: 10.52436/1.jutif.2024.5.4.2157.
V. Sidharth and C. R. Kavitha, “Network Intrusion Detection System Using Stacking and Boosting Ensemble Methods,” in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), IEEE, Sep. 2021, pp. 357–363. doi: 10.1109/ICIRCA51532.2021.9545022.
I. Syamsuddin and O. M. Barukab, “SUKRY: Suricata IDS with Enhanced kNN Algorithm on Raspberry Pi for Classifying IoT Botnet Attacks,” Electronics (Basel), vol. 11, no. 5, p. 737, Feb. 2022, doi: 10.3390/electronics11050737.
W. Shang, P. Zeng, M. Wan, L. Li, and P. An, “Intrusion detection algorithm based on OCSVM in industrial control system,” Security and Communication Networks, vol. 9, no. 10, pp. 1040–1049, Jul. 2016, doi: 10.1002/sec.1398.
H. Zhang, J. L. Li, X. M. Liu, and C. Dong, “Multi-Dimensional Feature Fusion and Stacking Ensemble Mechanism for Network Intrusion Detection,” Future Generation Computer Systems, vol. 122, pp. 130–143, Sep. 2021, doi: 10.1016/j.future.2021.03.024.
M. Ali et al., “Effective Network Intrusion Detection using Stacking-Based Ensemble Approach,” Int J Inf Secur, vol. 22, no. 6, pp. 1781–1798, Dec. 2023, doi: 10.1007/s10207-023-00718-7.
P. Gupta, Y. Ghatole, and N. Reddy, “Stacked Autoencoder based Intrusion Detection System using One-Class Classification,” in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, Jan. 2021, pp. 643–648. doi: 10.1109/Confluence51648.2021.9377069.
M. H. Almourish, O. A. I. Abduljalil, and A. E. B. Alawi, “Anomaly-Based Web Attacks Detection Using Machine Learning,” in Proceedings of 2nd International Conference on Smart Computing and Cyber Security, M. A. A. A. A. Pattnaik Prasant Kumarand Sain, Ed., Singapore: Springer Nature Singapore, 2022, pp. 306–314. doi: 10.1007/978-981-16-9480-6_29.
C. Zha et al., “A-NIDS: Adaptive Network Intrusion Detection System Based on Clustering and Stacked CTGAN,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 3204–3219, 2025, doi: 10.1109/TIFS.2025.3551643.
D. D. Tang, V. Q. Nguyen, V. H. Nguyen, T. C. Nguyen, and N. Shone, “A Novel Deep Learning Approach with Magnet Loss Optimization for Website Attack Detection,” in 2024 1st International Conference On Cryptography And Information Security (VCRIS), 2024, pp. 1–6. doi: 10.1109/VCRIS63677.2024.10813436.
A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE,” Procedia Comput Sci, vol. 234, pp. 578–583, 2024, doi: 10.1016/j.procs.2024.03.042.
A. A. Alfrhan, R. H. Alhusain, and R. U. Khan, “SMOTE: Class Imbalance Problem in Intrusion Detection System,” in 2020 International Conference on Computing and Information Technology (ICCIT-1441), IEEE, Sep. 2020, pp. 1–5. doi: 10.1109/ICCIT-144147971.2020.9213728.
H. R. Sayegh, W. Dong, and A. M. Al-madani, “Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data,” Applied Sciences, vol. 14, no. 2, p. 479, Jan. 2024, doi: 10.3390/app14020479.
S. A. Abdulkareem, C. H. Foh, F. Carrez, and K. Moessner, “SMOTE-Stack for Network Intrusion Detection in an IoT Environment,” in 2022 IEEE Symposium on Computers and Communications (ISCC), IEEE, Jun. 2022, pp. 1–6. doi: 10.1109/ISCC55528.2022.9912910.
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, 2018, pp. 108–116. doi: 10.5220/0006639801080116.
K. C. Santos, R. S. Miani, and F. de O. Silva, “Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems,” Journal of Network and Systems Management, vol. 32, no. 2, p. 36, Apr. 2024, doi: 10.1007/s10922-024-09813-z.
G. Sah, S. Banerjee, and S. Singh, “Intrusion Detection System Over Real-Time Data Traffic Using Machine Learning Methods with Feature Selection Approaches,” Int J Inf Secur, vol. 22, no. 1, pp. 1–27, Feb. 2023, doi: 10.1007/s10207-022-00616-4.
Z. Ning, Z. Jiang, and D. Zhang, “Sparse Projection Infinite Selection Ensemble for Imbalanced Classification,” Knowl Based Syst, vol. 262, p. 110246, Feb. 2023, doi: 10.1016/j.knosys.2022.110246.
A. Jadhav, D. Pramod, and K. Ramanathan, “Comparison of Performance of Data Imputation Methods for Numeric Dataset,” Applied Artificial Intelligence, vol. 33, no. 10, pp. 913–933, Aug. 2019, doi: 10.1080/08839514.2019.1637138.
H. Chamlal, T. Ouaderhman, and F. Aaboub, “A Graph Based Preordonnances Theoretic Supervised Feature Selection in High Dimensional Data,” Knowl Based Syst, vol. 257, p. 109899, Dec. 2022, doi: 10.1016/j.knosys.2022.109899.
L. K. Mramba et al., “Detecting Potential Outliers in Longitudinal Data with Time-Dependent Covariates,” Eur J Clin Nutr, vol. 78, no. 4, pp. 344–350, 2024, doi: 10.1038/s41430-023-01393-6.
S. Sharma and S. Chatterjee, “Winsorization for Robust Bayesian Neural Networks,” Entropy, vol. 23, no. 11, p. 1546, Nov. 2021, doi: 10.3390/e23111546.
P. Nousi and A. Tefas, “Deep Label Embedding Learning for Classification,” Appl Soft Comput, vol. 163, p. 111925, Sep. 2024, doi: 10.1016/j.asoc.2024.111925.
A. Rácz, D. Bajusz, and K. Héberger, “Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification,” Molecules, vol. 26, no. 4, p. 1111, Feb. 2021, doi: 10.3390/molecules26041111.
T. Fontanari, T. C. Fróes, and M. Recamonde-Mendoza, “Cross-validation Strategies for Balanced and Imbalanced Datasets,” in BRACIS 2022, J. C. R. R. A. Xavier-Junior, Ed., Springer International Publishing, 2022, pp. 626–640. doi: 10.1007/978-3-031-21686-2_43.
M. A. Siddiqi and W. Pak, “An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection,” IEEE Access, vol. 9, pp. 137494–137513, 2021, doi: 10.1109/ACCESS.2021.3118361.
S. S. Panwar, Y. P. Raiwani, and L. S. Panwar, “An Intrusion Detection Model for CICIDS-2017 Dataset Using Machine Learning Algorithms,” in 2022 International Conference on Advances in Computing, Communication and Materials (ICACCM), IEEE, Nov. 2022, pp. 1–10. doi: 10.1109/ICACCM56405.2022.10009400.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
F. Kamalov, S. E. Choutri, and A. F. Atiya, “Analytical Formulation of Synthetic Minority Oversampling Technique (SMOTE) for Imbalanced Learning,” Gulf Journal of Mathematics, vol. 19, no. 1, pp. 400–415, Jan. 2025, doi: 10.56947/gjom.v19i1.2639.
L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
D. Ghosh and J. Cabrera, “Enriched Random Forest for High Dimensional Genomic Data,” IEEE/ACM Trans Comput Biol Bioinform, vol. 19, no. 5, pp. 2817–2828, Sep. 2022, doi: 10.1109/TCBB.2021.3089417.
G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” in Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:3815895
S. Farhadpour, T. A. Warner, and A. E. Maxwell, “Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices,” Remote Sens (Basel), vol. 16, no. 3, p. 533, Jan. 2024, doi: 10.3390/rs16030533.
A. A. Abbasi, A. Zameer, E. Mushtaq, and M. A. Z. Raja, “Cost-Sensitive Stacked Long Short-Term Memory with an Evolutionary Framework for Minority Class Detection,” Appl Soft Comput, vol. 165, p. 112098, Nov. 2024, doi: 10.1016/j.asoc.2024.112098.
F. Li, W. Ma, H. Li, and J. Li, “Improving Intrusion Detection System Using Ensemble Methods and Over-Sampling Technique,” in 2022 4th International Academic Exchange Conference on Science and Technology Innovation (IAECST), IEEE, Dec. 2022, pp. 1200–1205. doi: 10.1109/IAECST57965.2022.10062178.
M. H. Alsulami, “Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks,” Applied Sciences, vol. 14, no. 17, p. 7763, Sep. 2024, doi: 10.3390/app14177763.
N. He, Z. Zhang, X. Wang, and T. Gao, “Efficient Privacy‐Preserving Federated Deep Learning for Network Intrusion of Industrial IoT,” International Journal of Intelligent Systems, vol. 2023, no. 1, p. 2956990, Jan. 2023, doi: 10.1155/2023/2956990.
Y. Li, Z. Li, and M. Li, “A Comprehensive Survey on Intrusion Detection Algorithms,” Computers and Electrical Engineering, vol. 121, p. 109863, Jan. 2025, doi: 10.1016/j.compeleceng.2024.109863.
M. A. Bouke and A. Abdullah, “An Empirical Study of Pattern Leakage Impact During Data Preprocessing on Machine Learning-Based Intrusion Detection Models Reliability,” Expert Syst Appl, vol. 230, p. 120715, Nov. 2023, doi: 10.1016/j.eswa.2023.120715.
S. K. Sahu, D. P. Mohapatra, J. K. Rout, K. S. Sahoo, and A. Kr. Luhach, “An Ensemble-Based Scalable Approach for Intrusion Detection Using Big Data Framework,” Big Data, vol. 9, no. 4, pp. 303–321, Aug. 2021, doi: 10.1089/big.2020.0201.
M. Aamir and S. M. Ali Zaidi, “Clustering based semi-supervised machine learning for DDoS attack classification,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 4, pp. 436–446, May 2021, doi: 10.1016/j.jksuci.2019.02.003.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fadli Dony Pradana, Farikhin, Budi Warsito

This work is licensed under a Creative Commons Attribution 4.0 International License.