IMPROVING HEART DISEASE PREDICTION ACCURACY USING PRINCIPAL COMPONENT ANALYSIS (PCA) IN MACHINE LEARNING ALGORITHMS

  • Zirji Jayidan Informatics Departement, Faculty of Computer Sciences, Universitas Buana Perjuangan Karawang, Indonesia
  • Amril Mutoi Siregar Informatics Departement, Faculty of Computer Sciences, Universitas Buana Perjuangan Karawang, Indonesia
  • Sutan Faisal Informatics Departement, Faculty of Computer Sciences, Universitas Buana Perjuangan Karawang, Indonesia
  • Hanny Hikmayanti Informatics Departement, Faculty of Computer Sciences, Universitas Buana Perjuangan Karawang, Indonesia
Keywords: Diagnostic Accuracy, Feature Extraction, Heart Disease Prediction, Machine Learning Algorithm, Principal Component Analysis (PCA)

Abstract

This study aims to improve the accuracy of heart disease prediction using Principal Component Analysis (PCA) for feature extraction and various machine learning algorithms. The dataset consists of 334 rows with 49 attributes, 5 classes and 31 target diagnoses. The five algorithms used were K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT). Results show that algorithms using PCA achieve high accuracy, especially RF, LR, and DT with accuracy up to 1.00. This research highlights the potential of PCA-based machine learning models in early diagnosis of heart disease.

Downloads

Download data is not yet available.

References

World Health Organization (WHO), “Cardiovascular diseases (CVDs),” World Health Organization (WHO).

World Heart Federation (WHF), “World-Heart-Report-2023,” World Heart Federation (WHF), pp. 3–4, 2023.

Kementerian Kesehatan RI, “Penyakit Jantung Penyebab Utama Kematian, Kemenkes Perkuat Layanan Primer,” Kementerian Kesehatan RI.

D. Speyer, “Good PCA examples for teaching,” Stack Exchange Inc.

L. G. Kabari and B. B. Nwamae, “Principal Component Analysis (PCA) - An Effective Tool in Machine Learning,” 2019. [Online]. Available: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

H. J. Chiu, T. H. S. Li, and P. H. Kuo, “Breast cancer–detection system using PCA, multilayer perceptron, transfer learning, and support vector machine,” IEEE Access, vol. 8, pp. 204309–204324, 2020, doi: 10.1109/ACCESS.2020.3036912.

Institute of Electrical and Electronics Engineers and Hindusthan Institute of Technology, Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020) : 02-04, July 2020.

A. U. Haq et al., “Detection of Breast Cancer through Clinical Data Using Supervised and Unsupervised Feature Selection Techniques,” IEEE Access, vol. 9, pp. 22090–22105, 2021, doi: 10.1109/ACCESS.2021.3055806.

A. Gron, “Hands-on Machine Learning with Scikit-learn, Keras, and Tensorflow,” [s.l.]: O’Reilly Media.

J. Resti, N. Salsabilla Basuni, and A. Mutoi Siregar, “Comparison of the Accuracy of Drug User Classification Models Using Machine Learning Methods,” vol. 5, p. 2026, doi: 10.29207/resti.v7ix.xxx.

R. Indrakumari, T. Poongodi, and S. R. Jena, “Heart Disease Prediction using Exploratory Data Analysis,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 130–139. doi: 10.1016/j.procs.2020.06.017.

E. Elhaik, “Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-14395-4.

D. Shah, S. Patel, and S. K. Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN Comput Sci, vol. 1, no. 6, Nov. 2020, doi: 10.1007/s42979-020-00365-y.

J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.

unud, “analisis data kategori menggunakan regresi logistik,” sinta.unud.ac.id.

M. Mia, A. F. N. Masruriyah, and A. R. Pratama, “The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease,” JURNAL SISFOTEK GLOBAL, vol. 12, no. 2, p. 138, Sep. 2022, doi: 10.38101/sisfotek.v12i2.551.

Luthfiana Ratnawati dan Dwi Ratna Sulistyaningrum, “Penerapan Random Forest untuk Mengukur Tingkat Keparahan Penyakit pada Daun Apel,” JURNAL SAINS DAN SENI ITS Vol. 8, No. 2 , 2019.

D. Cheng, Y. Shi, T. Lin, B. H. Gwee, and K. A. Toh, “Hybrid K-means clustering and support vector machine method for via and metal line detections in delayered IC images,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 12, pp. 1849–1853, Dec. 2018, doi: 10.1109/TCSII.2018.2827044.

Y. Muhammad, M. Tahir, M. Hayat, and K. T. Chong, “Early and accurate detection and diagnosis of heart disease using intelligent computational model,” Sci Rep, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-76635-9.

A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, R. Sun, and I. Garciá-Magarinõ, “A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms,” Mobile Information Systems, vol. 2018, 2018, doi: 10.1155/2018/3860146.

Binus University School of Information System, “DECISION TREE ALGORITMA BESERTA CONTOHNYA PADA DATA MINING,” Binus University School of Information System.

T. T. Maskoen and D. Purnama, “Area Under the Curve dan Akurasi Cystatin C untuk Diagnosis Acute Kidney Injury pada Pasien Politrauma,” Majalah Kedokteran Bandung, vol. 50, no. 4, pp. 259–264, Dec. 2018, doi: 10.15395/mkb.v50n4.1342.

H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” International Journal of Electrical and Computer Engineering, vol. 11, no. 3, pp. 2407–2413, Jun. 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput Oper Res, vol. 152, Apr. 2023, doi: 10.1016/j.cor.2022.106131.

M. H. Z. Al Faroby, M. I. Irawan, and N. N. T. Puspaningsih, “XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 253–262, Nov. 2020, doi: 10.22219/kinetik.v5i4.1076.

S. Narkhede, “Understanding AUC - ROC Curve,” Understanding AUC - ROC Curve.

A. J. Bowers and X. Zhou, “Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes,” J Educ Stud Placed Risk, vol. 24, no. 1, pp. 20–46, Jan. 2019, doi: 10.1080/10824669.2018.1523734.

T. Milo and A. Somech, “Automating Exploratory Data Analysis via Machine Learning: An Overview,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, Jun. 2020, pp. 2617–2622. doi: 10.1145/3318464.3383126.

Z. Mushtaq, A. Yaqub, A. Hassan, and S. Feng Su, “Performance Analysis of Supervised Classifiers using PCA based Techniques on Breast Cancer.”.

Published
2024-06-04
How to Cite
[1]
Z. Jayidan, A. M. Siregar, S. Faisal, and H. Hikmayanti, “IMPROVING HEART DISEASE PREDICTION ACCURACY USING PRINCIPAL COMPONENT ANALYSIS (PCA) IN MACHINE LEARNING ALGORITHMS”, J. Tek. Inform. (JUTIF), vol. 5, no. 3, pp. 821-830, Jun. 2024.