IMPROVING PERFORMANCE OF STUDENTS’ GRADE CLASSIFICATION MODEL USES NAÏVE BAYES GAUSSIAN TUNING MODEL AND FEATURE SELECTION
Abstract
Student grades are a relevant variable for predicting student academic performance. In achieving good and quality student performance, it is necessary to analyze or evaluate the factors that influence student performance. When a educator can predict students' academic performance from the start, the educator can adjust the way of learning so that learning can run effectively. The purpose of this research is to study how it is applied to determine the interrelationships between variables and find out which variables have an effect, then use it as a feature selection technique. Then, researchers review the most popular classifier, Gaussian Naïve Bayes (GNB). Next, we survey the feature selection models and discuss the feature selection approach. In this study, researchers will classify student grades based on existing features to evaluate student performance, so it can guide educators in selecting learning methods and assist students in planning the learning process. The result is that applying Gaussian Naïve Bayes (GNB) without feature selection has a lower accuracy of 10.12% while using feature selection the accuracy increases to 10.12%.
Downloads
References
A. Zohair and L. Mahmoud, "Prediction of Student’s performance by modelling small dataset size." International Journal of Educational Technology in Higher Education, vol.16, no.1, pp.1-18, 2019, DOI : https://doi.org/10.1186/s41239-019-0160-3.
A. Hellas, P. Ihantola, A. Petersen, V. V. Ajanovski, M. Gutica, T. Hynninen, ... and S.N. Liao, "Predicting academic performance: a systematic literature review". In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education, pp. 175-199, 2018, DOI : https://doi.org/10.1145/3293881.3295783.
L. Zhang, and K. F. Li, "Education analytics: Challenges and approaches". In 2018 32nd International Conference on advanced information networking and applications workshops (WAINA), pp. 193-198, IEEE, 2018, DOI : 10.1109/WAINA.2018.00086.
J. V. Macayan, "Implementing outcome-based education (OBE) framework: Implications for assessment of students’ performance". Educational Measurement and Evaluation Review, vol.8, no.1, pp. 1-10, 2017.
M. Mahajan, and M. K. S. Singh, "Importance and benefits of learning outcomes". IOSR Journal of Humanities and Social Science, vol.22, no.03, 65-67. 2017, DOI : 10.9790/0837-2203056567.
A. E. Tatar, and D. Düştegör, "Prediction of academic performance at undergraduate graduation: Course grades or grade point average?", Applied sciences, vol.10, no.14, pp. 4967, 2020, DOI : https://doi.org/10.3390/app10144967.
E. T. Lau, L. Sun, and Q. Yang, "Modelling, prediction and classification of student academic performance using artificial neural networks". SN Applied Sciences, vol.1, no.9, pp. 1-10. 2019, DOI : https://doi.org/10.1007/s42452-019-0884-7.
UCI Machine Learning Repository, “Student Performance Data Set”, 2014. https://archive.ics.uci.edu/ml/datasets/student+performance [accessed Nov. 2, 2022].
S. A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, "Data preprocessing in predictive data mining", The Knowledge Engineering Review, 34, 2019, DOI : https://doi.org/10.1017/S026988891800036X.
H. Benhar, A. Idri, and J. L. Fernández-Alemán, "Data preprocessing for heart disease classification: A systematic literature review", Computer Methods and Programs in Biomedicine, vol.195, pp.105635, 2020, DOI : https://doi.org/10.1016/j.cmpb.2020.105635.
C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, "Splitfed: When federated learning meets split learning". In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, pp. 8485-8493, 2022, DOI : https://doi.org/10.1609/aaai.v36i8.20825.
R. Y. Choi, A. S. Coyner, J. Kalpathy-Cramer, M. F. Chiang, and J. P. Campbell, "Introduction to machine learning, neural networks, and deep learning". Translational Vision Science & Technology, vol.9, no.2, pp.14-14, 2020, DOI : https://doi.org/10.1167/tvst.9.2.14.
A. K. Ahmad, A. Jafar, and K. Aljoumaa, "Customer churn prediction in telecom using machine learning in big data platform", Journal of Big Data, vol.6, no.1, pp. 1-24, 2019, DOI : https://doi.org/10.1186/s40537-019-0191-6.
R. Baboota, and H. Kaur, "Predictive analysis and modelling football results using machine learning approach for English Premier League", International Journal of Forecasting, vol.35, no.2, pp.741-755, 2019, DOI : https://doi.org/10.1016/j.ijforecast.2018.01.003.
N. Sneha, and T. Gangil, "Analysis of diabetes mellitus for early prediction using optimal features selection". Journal of Big Data, 6(1), pp. 1-19, 2019, DOI : https://doi.org/10.1186/s40537-019-0175-6.
L. Ali, S. U. Khan, N. A. Golilarz, I. Yakubu, I. Qasim, A. Noor, and R. Nour, "A feature-driven decision support system for heart failure prediction based on statistical model and Gaussian naive bayes", Computational and Mathematical Methods in Medicine, 2019, DOI : https://doi.org/10.1155/2019/6314328.
M. M. Saritas, and A. Yasar, "Performance analysis of ANN and Naive Bayes classification algorithm for data classification". International Journal of Intelligent Systems and Applications in Engineering, vol.7, no.2, pp. 88-91, 2019, DOI : https://doi.org/10.18201//ijisae.2019252786 .
A. M. Jiménez-Carvelo, A. González-Casado, M. G. Bagur-González, and L. Cuadros-Rodríguez, "Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity–A review", Food research international, vol.122, pp. 25-39, 2019, DOI: https://doi.org/10.1016/j.foodres.2019.03.063.
Z. A. Bukhsh, A. Saeed, I. Stipanovic, and A. G. Doree, "Predictive maintenance using tree-based classification techniques: A case of railway switches", Transportation Research Part C: Emerging Technologies, vol.101, pp.35-54, 2019, DOI : https://doi.org/10.1016/j.trc.2019.02.001.
J. Bi, and C. Zhang, "An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme", Knowledge-Based Systems, vol.158, pp. 81-93, 2018, DOI : https://doi.org/10.1016/j.knosys.2018.05.037
J. A. Sidey-Gibbons, and C. J. Sidey-Gibbons, "Machine learning in medicine: a practical introduction", BMC medical research methodology, vol.19, no.1, pp. 1-18, 2019, DOI : https://doi.org/10.1186/s12874-019-0681-4.
A. Tharwat, "Classification assessment methods", Applied Computing and Informatics, vol. 17, no. 1, pp. 168-192, 2020, DOI : https://doi.org/10.1016/j.aci.2018.08.003.
R. Venkatesh, C. Balasubramanian, and M. Kaliappan, "Development of big data predictive analytics model for disease prediction using machine learning technique". Journal of medical systems, vol.43, no.8, pp. 1-8, 2019, DOI : https://doi.org/10.1007/s10916-019-1398-y.
A. E. Maxwell, T. A. Warner, and F. Fang, "Implementation of machine-learning classification in remote sensing: An applied review". International Journal of Remote Sensing, vol.39, no.9, pp. 2784-2817, 2018, DOI : https://doi.org/10.1080/01431161.2018.1433343.
M. F. A. Saputra, T. Widiyaningtyas, and A.P. Wibawa, "Illiteracy classification using K means-Naïve Bayes algorithm", JOIV: International Journal on Informatics Visualization, vol.2, no.3, pp. 153-158, 2018, DOI : https://doi.org/10.30630/joiv.2.3.129.
A. A. Rafique, A. Jalal, and A. Ahmed, "Scene Understanding and Recognition: Statistical Segmented Model using Geometrical Features and Gaussian Naïve Bayes", In IEEE conference on International Conference on Applied and Engineering Mathematics, vol. 57, 2019.
C. A. Ramezan, T. A. Warner, and A. E. Maxwell, "Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification", Remote Sensing, vol.11, no.2, pp. 185, 2019, DOI : https://doi.org/10.3390/rs11020185.
I. Tougui, A. Jilbab, and J. El Mhamdi, "Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications", Healthcare informatics research, vol.27, no.3, pp. 189-199, 2021, DOI : https://doi.org/10.4258/hir.2021.27.3.189.
D. Chicco, N. Tötsch, and G. Jurman, "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation", BioData mining, vol.14, no.1, pp. 1-22, 2021, DOI : https://doi.org/10.1186/s13040-021-00244-z.
G. Zeng, "On the confusion matrix in credit scoring and its analytical properties", Communications in Statistics-Theory and Methods, vol.49, no.9, pp. 2080-2093, 2020, DOI : https://doi.org/10.1080/03610926.2019.1568485.
M. Shobana, V. R. Balasraswathi, R. Radhika, Ahmed Kareem Oleiwi, Sushovan Chaudhury, Ajay S. Ladkat, Mohd Naved, and Abdul Wahab Rahmani, "Classification and Detection of Mesothelioma Cancer Using Feature Selection-Enabled Machine Learning Technique", BioMed Research International, vol. 2022, 2022. DOI : https://doi.org/10.1155/2022/9900668.
S. Hartati, N. A. Ramdhan, and H. A. SAN, “Prediksi Kelulusan Mahasiswa Dengan Naïve Bayes Dan Feature Selection Information Gain ”, Jurnal Ilmiah Intech, vol. 4, no. 02, pp. 223–234, 2022, DOI : https://doi.org/10.46772/intech.v4i02.889.
D. Jain, and V. Singh, "Feature selection and classification systems for chronic disease prediction: A review", Egyptian Informatics Journal, vol.19, no.3, pp. 179-189, 2018, DOI : https://doi.org/10.1016/j.eij.2018.03.002.
Copyright (c) 2023 M Hafidz Ariansyah, Esmi Nur Fitri, Sri Winarno
This work is licensed under a Creative Commons Attribution 4.0 International License.