Comparison of ANOVA and Chi-Square Feature Selection Methods to Improve Machine Learning Performance in Anemia Classification
DOI:
https://doi.org/10.52436/1.jutif.2025.6.4.5017Keywords:
Anemia, Classification, Improvement, Machine Learning, PerformanceAbstract
Anemia is a prevalent hematological condition marked by decreased hemoglobin concentration in the blood, which can lead to serious health complications if undetected. Although machine learning has shown potential in supporting early diagnosis, its effectiveness is often hindered by irrelevant or excessive features. This study investigates the impact of ANOVA and Chi-Square feature selection methods in improving the effectiveness of three distinct machine learning models algorithms, Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) for anemia classification. Using a Kaggle dataset consisting of 15,300 instances and 25 features, the evaluation of each model was conducted with reference to its accuracy, precision, recall, and F1-score, both before and after applying feature selection. Experimental results show a substantial improvement in classification performance after feature selection, with the SVM + ANOVA combination achieving the highest accuracy of 94.61%. In contrast, models without feature selection performed below 90%, highlighting the need for appropriate feature reduction techniques. This study contributes a comparative analysis framework for medical data classification, emphasizing the role of statistical feature selection in optimizing model accuracy. Its novelty lies in demonstrating consistent performance improvement across algorithms using real-world anemia data and providing evidence that ANOVA and Chi-Square can significantly enhance model generalization in medical diagnostic contexts.
Downloads
References
T. Qadah and A. Munshi, “Synthesis and Prediction of Anemia from Multi-Data Attribute Co-Existence,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3510636.
Muljono, S. A. Wulandari, H. Al Azies, M. Naufal, W. A. Prasetyanto, and F. A. Zahra, “Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI,” IEEE Access, vol. 12, pp. 9292–9307, 2024, doi: 10.1109/ACCESS.2024.3353788.
T. Qadah and A. Munshi, “Synthesis and Prediction of Anemia from Multi-Data Attribute Co-Existence,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3510636.
A. Al-Jawaldeh et al., “Are countries of the eastern mediterranean region on track towards meeting the world health assembly target for anemia? A review of evidence,” Int J Environ Res Public Health, vol. 18, no. 5, p. 2449, 2021, Accessed: Aug. 06, 2025. [Online]. Available: https://doi.org/10.3390/ijerph18052449
Development of a Semaphore of Anemia: Screening Method Based on Photographic Images of the Ungueal Bed Using a Digital Camera. 2020. doi: 10.0/Linux-x86_64.
W. Gardner and N. Kassebaum, “Global, Regional, and National Prevalence of Anemia and Its Causes in 204 Countries and Territories, 1990–2019,” Curr Dev Nutr, vol. 4, p. nzaa053_035, 2020, doi: https://doi.org/10.1093/cdn/nzaa053_035.
K. Aprilianti Cia et al., “Asupan Zat Besi dan Prevalensi Anemia pada Remaja Usia 16-18 Tahun.” doi: https://doi.org/10.31004/jerkin.v3i4.628.
D. Chaerul Ekty Saputra, K. Sunat, and T. Ratnaningsih, “SMOTE-MRS: A Novel SMOTE-Multiresolution Sampling Technique for Imbalanced Distribution to Improve Prediction of Anemia,” IEEE Access, vol. 12, pp. 154675–154699, 2024, doi: 10.1109/ACCESS.2024.3482968.
M. M. Ali, M. S. Islam, M. N. Uddin, and M. A. Uddin, “A conceptual IoT framework based on Anova-F feature selection for chronic kidney disease detection using deep learning approach,” Intell Based Med, vol. 10, Jan. 2024, doi: 10.1016/j.ibmed.2024.100170.
B. Heinrichs and S. B. Eickhoff, “Your evidence? Machine learning algorithms for medical diagnosis and prediction,” Hum Brain Mapp, vol. 41, no. 6, pp. 1435–1444, Apr. 2020, doi: 10.1002/hbm.24886.
J. W. Asare, P. Appiahene, and E. T. Donkoh, “Detection of anaemia using medical images: A comparative study of machine learning algorithms – A systematic literature review,” Jan. 01, 2023, Elsevier Ltd. doi: 10.1016/j.imu.2023.101283.
T. Karagül Yıldız, N. Yurtay, and B. Öneç, “Classifying anemia types using artificial learning methods,” Engineering Science and Technology, an International Journal, vol. 24, no. 1, pp. 50–70, Feb. 2021, doi: 10.1016/j.jestch.2020.12.003.
J. W. Asare, W. L. Brown-Acquaye, M. M. Ujakpa, E. Freeman, and P. Appiahene, “Application of machine learning approach for iron deficiency anaemia detection in children using conjunctiva images,” Inform Med Unlocked, vol. 45, Jan. 2024, doi: 10.1016/j.imu.2024.101451.
O. Peretz, M. Koren, and O. Koren, “Naive Bayes classifier – An ensemble procedure for recall and precision enrichment,” Eng Appl Artif Intell, vol. 136, Oct. 2024, doi: 10.1016/j.engappai.2024.108972.
Z. Khan et al., “A Framework for Segmentation and Classification of Blood Cells Using Generative Adversarial Networks,” IEEE Access, vol. 12, pp. 51995–52015, 2024, doi: 10.1109/ACCESS.2024.3378575.
A. Kumar, N. Gaur, and A. Nanthaamornphong, “Machine learning RNNs, SVM and NN Algorithm for Massive-MIMO-OTFS 6G Waveform with Rician and Rayleigh channel,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: 10.1016/j.eij.2024.100531.
T. Wahyuningsih, D. Manongga, I. Sembiring, and S. Wijono, “Comparison of Effectiveness of Logistic Regression, Naive Bayes, and Random Forest Algorithms in Predicting Student Arguments,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 349–356. doi: 10.1016/j.procs.2024.03.014.
Development of a Semaphore of Anemia: Screening Method Based on Photographic Images of the Ungueal Bed Using a Digital Camera. 2020. doi: 10.0/Linux-x86_64.
H. Sabir et al., “Fingertip Video Dataset for Non-Invasive Diagnosis of Anemia Using ResNet-18 Classifier,” IEEE Access, vol. 12, pp. 68880–68892, 2024, doi: 10.1109/ACCESS.2024.3398353.
T. Karagül Yıldız, N. Yurtay, and B. Öneç, “Classifying anemia types using artificial learning methods,” Engineering Science and Technology, an International Journal, vol. 24, no. 1, pp. 50–70, Feb. 2021, doi: 10.1016/j.jestch.2020.12.003.
M. J. Maasthi, H. L. Gururaj, V. Ravi, B. D, M. Almeshari, and Y. Alzamil, “Decision-making Support System for Predicting and Eliminating Malnutrition and Anemia,” Open Bioinforma J, vol. 16, no. 1, Nov. 2023, doi: 10.2174/0118750362246898230921054021.
J. W. Asare, W. L. Brown-Acquaye, M. M. Ujakpa, E. Freeman, and P. Appiahene, “Application of machine learning approach for iron deficiency anaemia detection in children using conjunctiva images,” Inform Med Unlocked, vol. 45, Jan. 2024, doi: 10.1016/j.imu.2024.101451.
P. Iacobescu, V. Marina, C. Anghel, and A. D. Anghele, “Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities,” Dec. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/jcdd11120396.
D. Zhao et al., “Ant colony optimization with horizontal and vertical crossover search: Fundamental visions for multi-threshold image segmentation,” Expert Syst Appl, vol. 167, p. 114122, 2021, doi: 10.1016/j.eswa.2020.114122.
F. Hasanzadeh, C. B. Josephson, G. Waters, D. Adedinsewo, Z. Azizi, and J. A. White, “Bias recognition and mitigation strategies in artificial intelligence healthcare applications,” Dec. 01, 2025, Nature Research. doi: 10.1038/s41746-025-01503-7.
M. A. Shanthi, “Optimizing predictive accuracy: A comparative study of feature selection strategies in the healthcare domain,” The Scientific Temper, vol. 15, no. spl-1, pp. 217–229, Oct. 2024, doi: 10.58414/scientifictemper.2024.15.spl.26.
N. K. Naik, M. V. Subbarao, P. K. Sethy, S. K. Behera, and G. R. Panigrahi, “Machine learning with analysis-of-variance-based method for identifying rice varieties,” J Agric Food Res, vol. 18, Dec. 2024, doi: 10.1016/j.jafr.2024.101397.
N. Nasution, F. Nasution, E. Erlin, and M. Hasan, “Evaluation Study of the Chi-Square Method for Feature Selection in Stroke Prediction with Random Forest Regression,” European Alliance for Innovation n.o., May 2024. doi: 10.4108/eai.30-10-2023.2343096.
M. M. Ali, M. S. Islam, M. N. Uddin, and M. A. Uddin, “A conceptual IoT framework based on Anova-F feature selection for chronic kidney disease detection using deep learning approach,” Intell Based Med, vol. 10, Jan. 2024, doi: 10.1016/j.ibmed.2024.100170.
S. Bani Hani and M. Ahmad, “Effective Prediction of Mortality by Heart Disease Among Women in Jordan Using the Chi-Squared Automatic Interaction Detection Model: Retrospective Validation Study,” JMIR Cardio, vol. 7, 2023, doi: https://doi.org/10.2196/48795.
I. Moura, A. Teles, D. Viana, J. Marques, L. Coutinho, and F. Silva, “Digital Phenotyping of Mental Health using multimodal sensing of multiple situations of interest: A Systematic Literature Review,” Feb. 01, 2023, Academic Press Inc. doi: 10.1016/j.jbi.2022.104278.
G. Y. Lee, L. Alzamil, and B. Doskenov, “A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance”, doi: 10.48550/arXiv.2109.07127.
S. Shahrabadi, T. Adão, E. Peres, R. Morais, L. G. Magalhães, and V. Alves, “Automatic Optimization of Deep Learning Training through Feature-Aware-Based Dataset Splitting,” Algorithms, vol. 17, no. 3, Mar. 2024, doi: 10.3390/a17030106.
R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable selection for Naïve Bayes classification,” Comput Oper Res, vol. 135, Nov. 2021, doi: 10.1016/j.cor.2021.105456.
X. Ou et al., “Hyperspectral Image Target Detection via Weighted Joint K-Nearest Neighbor and Multitask Learning Sparse Representation,” IEEE Access, vol. 8, pp. 11503–11511, 2020, doi: 10.1109/ACCESS.2019.2962875.
A. Venkataramana, K. Suresh Kumar, N. Suganthi, and R. Rajeswari, “Prediction of Brinjal Plant Disease Using Support Vector Machine and Convolutional Neural Network Algorithm Based on Deep Learning,” Journal of Mobile Multimedia, vol. 18, no. 3, pp. 771–788, 2022, doi: 10.13052/jmm1550-4646.18315.
C. Wu et al., “SEMG Measurement Position and Feature Optimization Strategy for Gesture Recognition Based on ANOVA and Neural Networks,” IEEE Access, vol. 8, pp. 56290–56299, 2020, doi: 10.1109/ACCESS.2020.2982405.
N. Su, X. An, C. Yan, and S. Ji, “Incremental attribute reduction method based on chi-square statistics and information entropy,” IEEE Access, vol. 8, pp. 98234–98243, 2020, doi: 10.1109/ACCESS.2020.2997013.
I. Markoulidakis and G. Markoulidakis, “Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis,” Technologies (Basel), vol. 12, no. 7, Jul. 2024, doi: 10.3390/technologies12070113.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Tiko Nur Annisa, Jasmir Jasmir , Nurhadi Nurhadi

This work is licensed under a Creative Commons Attribution 4.0 International License.