PCOS DISEASE CLASSIFICATION USING FEATURE SELECTION RFECV AND EDA WITH KNN ALGORITHM METHOD
Abstract
Polycystic ovary syndrome is an endocrine disorder of the ovaries that causes hormonal disturbances in women of reproductive age, where androgen secretion in the ovaries of women with Polycystic Ovary Syndrome (PCOS) is excessive compared to normal women. This usually occur in women with obesity which is characterized by irregular menstrual cycles, chronic anovulation, hyperandrogenism, and even infertility. Efforts are used to treat this disease in the form of hormone therapy, laparoscopic ovarian drilling, and in-vitro fertilization. However, these three therapies are focused on symptomatic therapy and are less effective in treating PCOS-related infertility. Detecting PCOS disease early is very necessary so that prevention and treatment can be carried out immediately. Therefore, a classification is carried out to detect PCOS disease by being able to analyze data that has a high degree of accuracy. The method used for the classification of PCOS disease is using the K Nearest Neighbor (KNN), method which previously carried out the feature selection process, namely the Exploratory Data Analysis (EDA), method which is used for the data analysis process by means of an analysis approach to data to find out the most accurate method and using the Recursive Feature Elimination and Cross-Validation (RFECV) selection method which ranks the features based on their level of importance to the prediction process. Further, the data classification process uses the K-Nearest Neighbors (KNN) algorithm. The results of the Exploratory Data Analysis (EDA) feature selection process produce 10 data attributes that are used and are continued by the Recursive Feature Elimination and Cross-Validation (RFECV) process by producing the 7 most important attributes used and finally the K-Nearest Neighbors (KNN) method has a high level high accuracy by producing an accuracy value of 93%, precision 82%, recall 100%, and F1 score 90%.
Downloads
References
E. Maggyvin and M. I. Barliana, “Literature Review: Inovasi Terapi Polycystic of Ovary Syndrome (PCOS) Menggunakan Targeted Drug Therapy Gen CYP19 RS2414096,” Farmaka, vol. 17, no. 1, pp. 107-118, 2019, doi: 10.24198/jf.v17i1.20829.g10054
A. Hendrawan, L. M. Huizen, A. P. R. Pinem, and D. A. Wicaksana, “Implementasi Pemilihan Fitur Metode Wrapper dan Embedded dalam Prediksi Ketepatan Kelulusan Mahasiswa,” Pros. Sem. Nas. Pen. dan Peng. Kep. Masy. (SNPPKM 2021), 2021.
S. A. Naufal, A. Adiwijaya, and W. Astuti, “Analisis Perbandingan Klasifikasi Support Vector Machine (SVM) dan K-Nearest Neighbors (KNN) untuk Deteksi Kanker dengan Data Microarray,” JURIKOM (Jur. Ris. Kom.), vol. 7, no. 1, pp. 162-168, 2020, doi: 10.30865/jurikom.v7i1.2014.
M. S. Wibawa and K. D. P. Novianti, “Reduksi Fitur Untuk Optimalisasi Klasifikasi Tumor Payudara Berdasarkan data Citra FNA,” E-Proceedings KNS&I STIKOM Bali, pp. 73-78, 2017.
T. L. Basuki, J. Jondri, and U. N. Wisesty, “Deteksi Polycystic Ovarian Syndrome (PCOS) Menggunakan Klasifikasi Microarray Data dengan Algoritma Artificial Neural Network (ANN) Backpropagation dan Principal Component Analysis,” e-Proceeding of Engineering, vol. 5, no. 3, 2018.
S. Zulaikhah, A. Aziz, and W. Harianto, “Optimasi Algoritma K-Nearest Neighbor (KNN) dengan Normalisasi dan Seleksi Fitur untuk Klasifikasi Penyakit Liver,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 2, pp. 439-445, 2022, doi: 10.36040/jati.v6i2.4722.
R. T. Prasetio, “Seleksi Fitur dan Optimasi Parameter K-NN Berbasis Algoritma Genetik Pada Dataset Medis”, Jurnal Responsif: Riset Sains dan Informatika, vol. 2, no. 2, pp. 213-221, 2020, doi: 10.51977/jti.v2i2.319.
E. D. Wahyuni, A. A. Arifiyanti, and M. Kustyani, “Exploratory Data Analysis dalam Konteks Klasifikasi Data Mining,” ReTII (Rek. Tek. Indus. Infor XIV), pp. 263-269, 2019.
I. Pratama, A. Y. Chandra, and P. T. Prasetyaningrum, “Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN,” Jur. Eks. Inf., vol. 11, no. 1, pp. 38-49, 2021, doi: 10.30864/eksplora.v11i1.578.
A. Ardiyansyah, P. A. Agustia, and R. Maulana, “Analisis Perbandingan Algoritma Klasifikasi Data Mining Untuk Dataset Blogger Dengan Rapid Miner,” Jurnal Khatulistiwa Informatika, vol. 6, no. 1, pp. 20-28, 2018, doi: 10.31294/jki.v6i1.3799.g2437.
F. V. P.Samosir, L. P. Mustamu, E. D. Anggara, A. I. Wiyogo, and A. Widjaja, “Exploratory Data Analysis terhadap Kepadatan Penumpang Kereta Rel Listrik,” Jurnal Teknik Informatika dan Sistem Informasi (JuTISI), vol. 7, no. 2, pp. 449-467, 2021, doi: 10.28932/jutisi.v7i2.3700.
M. D. Nurmalasari, K. Kusrini, and S. Sudarman, “Komparasi Algoritma Naive Bayes dan K-Nearest Neighbor untuk Membangun Pengetahuan Diagnosa Penyakit Diabetes”, Jurnal Komtika, vol. 5, no. 1, pp. 52-59, 2021.
M. S. Faradisa, “Implementasi IQR-SMOTE Untuk Mengatasi Ketidakseimbangan Kelas Pada Klasifikasi Diabetes Menggunakan K-Nearest Neighbors,” JIK (Jurnal Ilmu Komputer), vol. 15, no. 1, pp. 48-60, 2022.
R. Permatasari and A. Wibowo, “Implementation of Support Vector Machine - Recursive Feature Elimination for MicroRNA Selection in Breast Cancer Classification,” Jurnal EECCIS, vol. 14, no. 1, pp. 1-5, 2020, doi: 10.21776/jeeccis.v14i1.602.
M. L. Huang, Y. H. Hung, W. M. Lee, R. K. Li, and B. R. Jiang, “SVM-FRE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier,” Hindawi Publishing, vol. 2014, pp. 1-10, 2014, doi: 10.1155/2014/795624.
A. Amiruddin and R. Ishak, “Implementasi Seleksi Fitur Klasifikasi Waktu Kelulusan Mahasiswa Menggunakan Correlation Matrix With Heatmap,” Jambura Journal of Electrical and Electronics Engineering (JJEEE), vol. 4, no. 2, pp. 169-174, 2022.
S. Fajar, E. W. Hidayat dan N. I. Kurniati, “Penerapan Metode K-Nearest Neighboar (KNN) untuk Menentukan Ikan Cupang Menggunakan Deteksi Tepi Canny dan Invariant Moment,” Jurnal Teknik Informatika (JUTIF), vol. 3, No 1, pp. 11-20, 2022, doi: 10.20884/1.jutif.2022.3.2.95.
Copyright (c) 2023 Nadhira Triadha Pitaloka, Kusnawi Kusnawi
This work is licensed under a Creative Commons Attribution 4.0 International License.