Comparison of SVM and Gradient Boosting with PCA for Website Phising Detection
DOI:
https://doi.org/10.52436/1.jutif.2025.6.2.4344Keywords:
Classification, Gradient Boosting, Phishing Website, Principal Component Analysis, Support Vector MachineAbstract
The increasing use of the internet has led to a rise in phishing attacks, posing a threat to user data security. This study compares the performance of the Support Vector Machine (SVM) and Gradient Boosting algorithms, integrated with Principal Component Analysis (PCA) for dimensionality reduction, in classifying phishing websites. The dataset consists of 11,054 samples classified into two categories: phishing (1) and non-phishing (-1), with three data partition scenarios for training and testing: 70:30, 80:20, and 90:10. Experimental results indicate that SVM outperforms Gradient Boosting in terms of accuracy and recall, particularly in detecting phishing websites. In the 80:20 and 70:30 data partition scenarios, the SVM model achieved an accuracy of 96% to 97% and had a higher recall for phishing websites, making it more sensitive to phishing detection. However, Gradient Boosting demonstrated consistent performance with an accuracy of around 94%, providing a balanced result between precision and recall for both classes. Therefore, the SVM model is superior for phishing detection tasks requiring high sensitivity to phishing websites, while Gradient Boosting remains a viable alternative when a more balanced performance between phishing and non-phishing sites is needed. The study concludes that both algorithms can be effectively used for phishing detection, with potential improvements through further experiments and hyperparameter tuning.
Downloads
References
V. A. Windarni, A. F. Nugraha, S. T. A. Ramadhani, D. A. Istiqomah, F. M. Puri, and A. Setiawan, “Deteksi website phishing menggunakan teknik filter pada model machine learning,”
Information System Journal (INFOS), vol. 6, no. 1, pp. 39–43, May 2023
Badan Pusat Statistik (BPS), “Statistik Telekomunikasi Indonesia 2021.” [Online]. Available: https://www.bps.go.id/id/publication/2022/09/07/bcc820e694c537ed3ec131b9/statistiktelekom unikasi-indonesia-2021.html.
Asosiasi Penyelenggara Jasa Internet Indonesia (Apjii), “Jumlah Pengguna Internet Indonesia Tembus 221 Juta Orang.” [Online]. Available: Https://Apjii.Or.Id/Berita/D/Apjii-Jumlah- Pengguna-Internet-Indonesia-Tembus-221-Juta-Orang.
M. F. Al Rifqi, M. Dina, M. Nknababan, and S. Aisyah, “Infokum dilisensikan di bawah lisensi internasional Creative Commons Attribution-Noncommercial 4.0 (CC BY-NC 4.0),” [Online].
Available: http://infor.seaninstitute.org/index.php/infokum/index
A. Mustopa, “Sistemasi: Jurnal Sistem Informasi perbandingan logistic regression dan random forest menggunakan correlation-based feature selection untuk deteksi website phishing,” [Online]. Available: http://sistemasi.ftik.unisi.ac.id
R. P. Ramadhan and T. Desyani, “Implementasi algoritma J48 untuk identifikasi website phishing,” *Teknik dan Multimedia*, vol. 1, no. 2, 2023.
L. R. Kalabarige, R. S. Rao, A. R. Pais, and L. A. Gabralla, “A boosting-based hybrid feature selection and multi-layer stacked ensemble learning model to detect phishing websites,” *IEEE Access*, vol. 11, pp. 71180–71193, 2023, doi: 10.1109/ACCESS.2023.3293649.
R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or not phishing? A survey on the detection of phishing websites,” *IEEE Access*, vol. 11, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.
M. Prasad and A. K. M, “Phishing website prediction using gradient boosting classifier,” *Int. J. Res. Appl. Sci. Eng. Technol.*, vol. 11, no. 7, pp. 1329–1335, Jul. 2023, doi: 10.22214/IJRASET.2023.54854.
K. Omari, “Phishing detection using gradient boosting classifier,” *Procedia Comput. Sci.*, vol. 230, pp. 120–127, 2023, doi: 10.1016/j.procs.2023.12.067.
R. D. Prayogo, A. R. Alfisyahrin, W. Gambetta, and S. A. Karimah, “An explainable machine learning-based phishing website detection using gradient boosting,” in *Proc. Int. Conf. Inf. Technol. Res. Innov. (ICITRI)*, Jakarta, Indonesia, 2024, pp. 76–81, doi: 10.1109/ICITRI62858.2024.10698870.
D. Wahyudi, M. Niswar, and A. A. P. Alimuddin, “Website phishing detection application using support vector machine (SVM),” *J. Inf. Technol. Its Util.*, vol. 5, no. 1, pp. 18–24, Jun. 2022, doi: 10.56873/jitu.5.1.4836.
E. S. Shombot, G. Dusserre, R. Bestak, and N. B. Ahmed, “An application for predicting phishing attacks: A case of implementing a support vector machine learning model,” *Cyber Security Appl.*, vol. 2, p. 100036, 2024, doi: 10.1016/j.csa.2024.100036.
S. Anupam and A. K. Kar, “Phishing website detection using support vector machines and nature-inspired optimization algorithms,” *Telecommun. Syst.*, vol. 76, no. 1, pp. 17–32, Jan. 2021, doi: 10.1007/s11235-020-00739-w.
E. A. Winanto, Y. Novianto, S. Sharipuddin, I. S. Wijaya, and P. A. Jusia, “Peningkatan performa deteksi serangan menggunakan metode PCA dan random forest,” *J. Teknol. Inf. Dan Ilmu Komputer. *, vol. 11, no. 2, pp. 285–290, Apr. 2024, doi: 10.25126/jtiik.20241127678.
D. Tuapattinaya, A. Wibowo, and Computer Science Department, Bina Nusantara University, Jakarta, Indonesia, “Phishing website detection using neural network and PCA based on feature selection,” *Int. J. Recent Technol. Eng.*, vol. 8, no. 6, pp. 1150–1152, Mar. 2020, doi: 10.35940/ijrte. D 4532.038620.
I. M. Karo Karo and Hendriyana, “Klasifikasi penderita diabetes menggunakan algoritma machine learning dan Z-score,” *Jurnal Teknol. Terpadu*, vol. 8, no. 2, pp. 94–99, Feb. 2022, doi: 10.1007/S11235-020-00739-W.
B. M. Akbar, A. T. Akbar, and R. Husaini, “Analisis sentimen dan emosi vaksin Sinovac pada Twitter menggunakan Naïve Bayes dan valence shifter,” *Jurnal Teknol. Terpadu*, vol. 7, no. 2, pp. 83–92, Dec. 2021, ISSN: 2477-0043, ISSN Online: 2460-7908.
I. Sabilirrasyad, A. Muliawan, M. Hermansyah, N. A. Prasetyo, and A. Wahid, “Unveiling X/Twitter’s Sentiment Landscape: A Python Crawler That Maps Opinion Using Advanced Search,” vol. 1, no. 1, [Online]. Available: https://twitter.com/search-advanced?lang=en.
H. 1, T. Wahyuningsih , and E. Rahwanto, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer.” http://archive.ics.uci.edu/ml.
J. Badriyah, N. Ramadhani, A. Muliawan, K. R. Ummah, A. Amrullah, and P. Korespondensi, “Jurnal Restikom: Riset Teknik Informatika dan Komputer Penerapan Dimensi Reduksi Pada Machine Learning Dalam Klasifikasi Kanker Payudara Berdasarkan Parameter Medis,” vol. 6, no. 3, pp. 526–533, 2024. Available: https://restikom.nusaputra.ac.id
A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Inform Med Unlocked, vol. 19, Jan. 2020, doi: 10.1016/j.imu.2020.100330.
Z. Susanti, P. Sirait, and E. S. Panjaitan, “Peningkatan Kinerja Random Forest Melalui Seleksi Fitur Secara Pca Untuk Mendeteksi Penyakit Diabetes Tahap Awal,” Jurnal Sains dan Teknologi, vol. 4, no. 3, pp. 51–56, doi: 10.55338/saintek. v5i1.1093.
G. S. M. Khamis, Z. M. S. Mohammed, S. M. Alanazi, A. F. A. Mahmoud, F. A. Abdalla, and
S. A. Bkheet, “Prediction of Myocardial Infarction Complications using Gradient Boosting,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 6, pp. 18550–18556, Dec. 2024, doi: 10.48084/etasr.9076.
J. Smith et al., “Placeholder Text: A Study,” Penerapan Metode Support Vector Machine Learning Dalam Klasifikasi Bunga Iris" vol. 3, Jul. 2021, doi: 10.10/X.
N. Huda Ovirianti, M. Zarlis, and H. Mawengkang, “Support Vector Machine Using AClassification Algorithm,” Jurnal dan Penelitian Teknik Informatika, vol. 6, no. 3, 2022, doi: 10.33395/sinkron. v 7i3.
A. F. Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk meningkatkan kinerja klasifikasi pada proses deteksi web phishing,” Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan, vol. 7, no. 1, pp. 39-44, Jun. 2022.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nur Aini Syam, Nurhikma Arifin, Wawan Firgiawan, Muhammad Furqan Rasyid

This work is licensed under a Creative Commons Attribution 4.0 International License.