Comparison of SVM and Gradient Boosting with PCA for Website Phising Detection

Authors

  • Nur Aini Syam Informatics, Universitas Sulawesi Barat, Indonesia
  • Nurhikma Arifin Informatics, Universitas Sulawesi Barat, Indonesia
  • Wawan Firgiawan Informatics, Universitas Sulawesi Barat, Indonesia
  • Muhammad Furqan Rasyid Information Science, Nara Institute of Science and Technology, Japan

DOI:

https://doi.org/10.52436/1.jutif.2025.6.2.4344

Keywords:

Classification, Gradient Boosting, Phishing Website, Principal Component Analysis, Support Vector Machine

Abstract

The increasing use of the internet has led to a rise in phishing attacks, posing a threat to user data security. This study compares the performance of the Support Vector Machine (SVM) and Gradient Boosting algorithms, integrated with Principal Component Analysis (PCA) for dimensionality reduction, in classifying phishing websites. The dataset consists of 11,054 samples classified into two categories: phishing (1) and non-phishing (-1), with three data partition scenarios for training and testing: 70:30, 80:20, and 90:10. Experimental results indicate that SVM outperforms Gradient Boosting in terms of accuracy and recall, particularly in detecting phishing websites. In the 80:20 and 70:30 data partition scenarios, the SVM model achieved an accuracy of 96% to 97% and had a higher recall for phishing websites, making it more sensitive to phishing detection. However, Gradient Boosting demonstrated consistent performance with an accuracy of around 94%, providing a balanced result between precision and recall for both classes. Therefore, the SVM model is superior for phishing detection tasks requiring high sensitivity to phishing websites, while Gradient Boosting remains a viable alternative when a more balanced performance between phishing and non-phishing sites is needed. The study concludes that both algorithms can be effectively used for phishing detection, with potential improvements through further experiments and hyperparameter tuning.

Downloads

Download data is not yet available.

References

V. A. Windarni, A. F. Nugraha, S. T. A. Ramadhani, D. A. Istiqomah, F. M. Puri, and A. Setiawan, “Deteksi website phishing menggunakan teknik filter pada model machine learning,”

Information System Journal (INFOS), vol. 6, no. 1, pp. 39–43, May 2023

Badan Pusat Statistik (BPS), “Statistik Telekomunikasi Indonesia 2021.” [Online]. Available: https://www.bps.go.id/id/publication/2022/09/07/bcc820e694c537ed3ec131b9/statistiktelekom unikasi-indonesia-2021.html.

Asosiasi Penyelenggara Jasa Internet Indonesia (Apjii), “Jumlah Pengguna Internet Indonesia Tembus 221 Juta Orang.” [Online]. Available: Https://Apjii.Or.Id/Berita/D/Apjii-Jumlah- Pengguna-Internet-Indonesia-Tembus-221-Juta-Orang.

M. F. Al Rifqi, M. Dina, M. Nknababan, and S. Aisyah, “Infokum dilisensikan di bawah lisensi internasional Creative Commons Attribution-Noncommercial 4.0 (CC BY-NC 4.0),” [Online].

Available: http://infor.seaninstitute.org/index.php/infokum/index

A. Mustopa, “Sistemasi: Jurnal Sistem Informasi perbandingan logistic regression dan random forest menggunakan correlation-based feature selection untuk deteksi website phishing,” [Online]. Available: http://sistemasi.ftik.unisi.ac.id

R. P. Ramadhan and T. Desyani, “Implementasi algoritma J48 untuk identifikasi website phishing,” *Teknik dan Multimedia*, vol. 1, no. 2, 2023.

L. R. Kalabarige, R. S. Rao, A. R. Pais, and L. A. Gabralla, “A boosting-based hybrid feature selection and multi-layer stacked ensemble learning model to detect phishing websites,” *IEEE Access*, vol. 11, pp. 71180–71193, 2023, doi: 10.1109/ACCESS.2023.3293649.

R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or not phishing? A survey on the detection of phishing websites,” *IEEE Access*, vol. 11, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.

M. Prasad and A. K. M, “Phishing website prediction using gradient boosting classifier,” *Int. J. Res. Appl. Sci. Eng. Technol.*, vol. 11, no. 7, pp. 1329–1335, Jul. 2023, doi: 10.22214/IJRASET.2023.54854.

K. Omari, “Phishing detection using gradient boosting classifier,” *Procedia Comput. Sci.*, vol. 230, pp. 120–127, 2023, doi: 10.1016/j.procs.2023.12.067.

R. D. Prayogo, A. R. Alfisyahrin, W. Gambetta, and S. A. Karimah, “An explainable machine learning-based phishing website detection using gradient boosting,” in *Proc. Int. Conf. Inf. Technol. Res. Innov. (ICITRI)*, Jakarta, Indonesia, 2024, pp. 76–81, doi: 10.1109/ICITRI62858.2024.10698870.

D. Wahyudi, M. Niswar, and A. A. P. Alimuddin, “Website phishing detection application using support vector machine (SVM),” *J. Inf. Technol. Its Util.*, vol. 5, no. 1, pp. 18–24, Jun. 2022, doi: 10.56873/jitu.5.1.4836.

E. S. Shombot, G. Dusserre, R. Bestak, and N. B. Ahmed, “An application for predicting phishing attacks: A case of implementing a support vector machine learning model,” *Cyber Security Appl.*, vol. 2, p. 100036, 2024, doi: 10.1016/j.csa.2024.100036.

S. Anupam and A. K. Kar, “Phishing website detection using support vector machines and nature-inspired optimization algorithms,” *Telecommun. Syst.*, vol. 76, no. 1, pp. 17–32, Jan. 2021, doi: 10.1007/s11235-020-00739-w.

E. A. Winanto, Y. Novianto, S. Sharipuddin, I. S. Wijaya, and P. A. Jusia, “Peningkatan performa deteksi serangan menggunakan metode PCA dan random forest,” *J. Teknol. Inf. Dan Ilmu Komputer. *, vol. 11, no. 2, pp. 285–290, Apr. 2024, doi: 10.25126/jtiik.20241127678.

D. Tuapattinaya, A. Wibowo, and Computer Science Department, Bina Nusantara University, Jakarta, Indonesia, “Phishing website detection using neural network and PCA based on feature selection,” *Int. J. Recent Technol. Eng.*, vol. 8, no. 6, pp. 1150–1152, Mar. 2020, doi: 10.35940/ijrte. D 4532.038620.

I. M. Karo Karo and Hendriyana, “Klasifikasi penderita diabetes menggunakan algoritma machine learning dan Z-score,” *Jurnal Teknol. Terpadu*, vol. 8, no. 2, pp. 94–99, Feb. 2022, doi: 10.1007/S11235-020-00739-W.

B. M. Akbar, A. T. Akbar, and R. Husaini, “Analisis sentimen dan emosi vaksin Sinovac pada Twitter menggunakan Naïve Bayes dan valence shifter,” *Jurnal Teknol. Terpadu*, vol. 7, no. 2, pp. 83–92, Dec. 2021, ISSN: 2477-0043, ISSN Online: 2460-7908.

I. Sabilirrasyad, A. Muliawan, M. Hermansyah, N. A. Prasetyo, and A. Wahid, “Unveiling X/Twitter’s Sentiment Landscape: A Python Crawler That Maps Opinion Using Advanced Search,” vol. 1, no. 1, [Online]. Available: https://twitter.com/search-advanced?lang=en.

H. 1, T. Wahyuningsih , and E. Rahwanto, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer.” http://archive.ics.uci.edu/ml.

J. Badriyah, N. Ramadhani, A. Muliawan, K. R. Ummah, A. Amrullah, and P. Korespondensi, “Jurnal Restikom: Riset Teknik Informatika dan Komputer Penerapan Dimensi Reduksi Pada Machine Learning Dalam Klasifikasi Kanker Payudara Berdasarkan Parameter Medis,” vol. 6, no. 3, pp. 526–533, 2024. Available: https://restikom.nusaputra.ac.id

A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Inform Med Unlocked, vol. 19, Jan. 2020, doi: 10.1016/j.imu.2020.100330.

Z. Susanti, P. Sirait, and E. S. Panjaitan, “Peningkatan Kinerja Random Forest Melalui Seleksi Fitur Secara Pca Untuk Mendeteksi Penyakit Diabetes Tahap Awal,” Jurnal Sains dan Teknologi, vol. 4, no. 3, pp. 51–56, doi: 10.55338/saintek. v5i1.1093.

G. S. M. Khamis, Z. M. S. Mohammed, S. M. Alanazi, A. F. A. Mahmoud, F. A. Abdalla, and

S. A. Bkheet, “Prediction of Myocardial Infarction Complications using Gradient Boosting,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 6, pp. 18550–18556, Dec. 2024, doi: 10.48084/etasr.9076.

J. Smith et al., “Placeholder Text: A Study,” Penerapan Metode Support Vector Machine Learning Dalam Klasifikasi Bunga Iris" vol. 3, Jul. 2021, doi: 10.10/X.

N. Huda Ovirianti, M. Zarlis, and H. Mawengkang, “Support Vector Machine Using AClassification Algorithm,” Jurnal dan Penelitian Teknik Informatika, vol. 6, no. 3, 2022, doi: 10.33395/sinkron. v 7i3.

A. F. Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk meningkatkan kinerja klasifikasi pada proses deteksi web phishing,” Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan, vol. 7, no. 1, pp. 39-44, Jun. 2022.

Additional Files

Published

2025-04-26

How to Cite

[1]
N. A. . Syam, N. Arifin, W. . Firgiawan, and M. F. . Rasyid, “Comparison of SVM and Gradient Boosting with PCA for Website Phising Detection”, J. Tek. Inform. (JUTIF), vol. 6, no. 2, pp. 691–708, Apr. 2025.