CLASSIFICATION MODELS FOR ACADEMIC PERFORMANCE: A COMPARATIVE STUDY OF NAÏVE BAYES AND RANDOM FOREST ALGORITHMS IN ANALYZING UNIVERSITY OF LAMPUNG STUDENT GRADES
Abstract
At the university, students are provided with a comprehensive assessment of their academic achievements for each course completed at the end of every semester. This study aimed to compare the effectiveness of two classification methods, the Naïve Bayes and the Random Forest methods, in classifying student learning outcomes. The research process is segmented into various stages: data selection, data preparation, model building and testing, and model evaluation. The findings indicated that the Naïve Bayes and Random Forest approaches exhibited superior accuracy levels when employing data splitting strategies, in contrast to k-fold cross-validation. Based on the examination, the Random Forest approach demonstrated superiority in identifying the scores of University of Lampung students, achieving an accuracy percentage of 99.38%. Notably, both techniques showed a substantial performance improvement using Gradient Boosting. The Naïve Bayes method attained an accuracy rate of 99.89%, while the Random Forest method reached 99.45%. The results demonstrate that employing the Random Forest classification method consistently leads to superior performance in identifying and classifying student grades. Furthermore, using Gradient Boosting in the boosting process has demonstrated its efficacy in enhancing the classification methods' accuracy. These findings significantly contribute to the comprehension and advancement of evaluation systems for assessing student learning outcomes in the university environment.
Downloads
References
M. A. Baig, S. A. Shaikh, K. K. Khatri, M. A. Shaikh, M. Z. Khan, and M. A. Rauf, "Prediction of Students Performance Level Using Integrated Approach of ML Algorithms," Int. J. Emerg. Technol. Learn., vol. 18, no. 1, 2023, doi: 10.3991/ijet.v18i01.35339.
A. A. Nafea, M. Mishlish, A. M. S. Shaban, M. M. AL-Ani, K. M. Ali Alheeti, and H. J. Mohammed, "Enhancing Student's Performance Classification Using Ensemble Modeling," Iraqi J. Comput. Sci. Math., vol. 4, no. 4, 2023, doi: 10.52866/ijcsm.2023.04.04.016.
M. Nachouki, E. A. Mohamed, R. Mehdi, and M. Abou Naaj, "Student course grade prediction using the random forest algorithm: Analysis of predictors' importance," Trends Neurosci. Educ., vol. 33, 2023, doi: 10.1016/j.tine.2023.100214.
F. Ofori, E. Maina, and R. Gitonga, "Using Machine Learning Algorithms to Predict Students' Performance and Improve Learning Outcome: A Literature-Based Review," J. Inf. Technol., vol. 4, no. 1, 2020.
S. Sharma, J. Agrawal, S. Agarwal, and S. Sharma, "Machine learning techniques for data mining: A survey," in 2013 IEEE International Conference on Computational Intelligence and Computing Research, IEEE ICCIC 2013, 2013. doi: 10.1109/ICCIC.2013.6724149.
A. P. Wibawa et al., “Naïve Bayes Classifier for Journal Quartile Classification,” Int. J. Recent Contrib. from Eng. Sci. IT, vol. 7, no. 2, 2019, doi: 10.3991/ijes.v7i2.10659.
Haviluddin, N. Dengen, E. Budiman, M. Wati, and U. Hairah, "Student Academic Evaluation using Naïve Bayes Classifier Algorithm," in Proceedings - 2nd East Indonesia Conference on Computer and Information Technology: Internet of Things for Industry, EIConCIT 2018, 2018. doi: 10.1109/EIConCIT.2018.8878626.
M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, "Do we need hundreds of classifiers to solve real-world classification problems?," J. Mach. Learn. Res., vol. 15, 2014.
J. L. Speiser, V. L. Durkalski, and W. M. Lee, "Random forest classification of etiologies for an orphan disease," Stat. Med., vol. 34, no. 5, 2014, doi: 10.1002/sim.6351.
J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, "A comparison of random forest variable selection methods for classification prediction modeling," Expert Systems with Applications, vol. 134. 2019. doi: 10.1016/j.eswa.2019.05.028.
S. K. Ghosh and F. Janan, "Prediction of student's performance using random forest classifier," in Proceedings of the International Conference on Industrial Engineering and Operations Management, 2021. doi: 10.46254/an11.20211238.
M. Yağcı, "Educational data mining: prediction of students' academic performance using machine learning algorithms," Smart Learn. Environ., vol. 9, no. 1, 2022, doi: 10.1186/s40561-022-00192-z.
K. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Glob. Transitions Proc., vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.
A. I. Kadhim, "An Evaluation of Pre-processing Techniques for Text Classification," Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 6, pp. 22–32, 2018, [Online]. Available: https://sites.google.com/site/ijcsis/
S. Roy, P. Sharma, K. Nath, D. K. Bhattacharyya, and J. K. Kalita, "Pre-processing: A data preparation step," in Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, vol. 1–3, 2018. doi: 10.1016/B978-0-12-809633-8.20457-3.
D. Singh and B. Singh, "Investigating the impact of data normalization on classification performance," Appl. Soft Comput., vol. 97, 2020, doi: 10.1016/j.asoc.2019.105524.
H. Aji Prihanditya and N. Hestu Aji Prihanditya, "The Implementation of Z-Score Normalization and Boosting Techniques to Increase Accuracy of C4.5 Algorithm in Diagnosing Chronic Kidney Disease," J. Soft Comput. Explore., vol. 1, no. 1, pp. 63–69, 2020.
C. L. M. Morais, M. C. D. Santos, K. M. G. Lima, and F. L. Martin, "Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach," Bioinformatics, vol. 35, no. 24, 2019, doi: 10.1093/bioinformatics/btz421.
V. R. Joseph, "Optimal ratio for data splitting," Stat. Anal. Data Min., vol. 15, no. 4, pp. 531–538, 2022, doi: 10.1002/sam.11583.
N. Alifiah, D. Kurniasari, Amanto, and Warsono, "Prediction of COVID-19 Using the Artificial Neural Network (ANN) with K-Fold Cross-Validation," J. Inf. Syst. Eng. Bus. Intell., vol. 9, no. 1, pp. 16–27, 2023, doi: 10.20473/jisebi.9.1.16-27.
M. Garonga and Rita Tanduk, "Comparison of Naive Bayes, Decision Tree, and Random Forest Algorithms in Classifying Learning Styles of Universitas Kristen Indonesia Toraja Students," J. Tek. Inform., vol. 4, no. 6, 2023, doi: 10.52436/1.jutif.2023.4.6.1020.
I. D. Mienye, Y. Sun, and Z. Wang, "An improved ensemble learning approach for the prediction of heart disease risk," Informatics Med. Unlocked, vol. 20, 2020, doi: 10.1016/j.imu.2020.100402.
R. A. Putri and N. S. Fatonah, “Perbandingan Metode Klasifikasi serta Analisis Faktor Akademis Pola Kelulusan Mahasiswa di Perguruan Tinggi,” J. Inform. J. Pengemb. IT, vol. 7, no. 2, 2022, doi: 10.30591/jpit.v7i2.3082.
D. A. Rachmawati, N. A. Ibadurrahman, J. Zeniarja, and N. Hendriyanto, "Implementation of the Random Forest Algorithm in Classifying the Accuracy of Graduation Time for Computer Engineering Students at Dian Nuswantoro University," J. Tek. Inform., vol. 4, no. 3, 2023, doi: 10.52436/1.jutif.2023.4.3.920.
G. A. Sandag, “Prediksi Rating Aplikasi App Store Menggunakan Algoritma Random Forest,” CogITo Smart J., vol. 6, no. 2, 2020, doi: 10.31154/cogito.v6i2.270.167-178.
J. Muliawan and E. Dazki, "Sentiment Analysis of Indonesia's Capital City Relocation using Three Algorithms: Naïve Bayes, KNN, and Random Forest," J. Tek. Inform., vol. 4, no. 5, 2023, doi: 10.52436/1.jutif.2023.4.5.1436.
A. R. Arrahimi, M. K. Ihsan, D. Kartini, M. R. Faisal, and F. Indriani, “Teknik Bagging Dan Boosting Pada Algoritma CART Untuk Klasifikasi Masa Studi Mahasiswa,” J. Sains dan Inform., vol. 5, no. 1, 2019, doi: 10.34128/jsi.v5i1.171.
Y. Pristyanto, “Penerapan Metode Ensemble untuk Meningkatkan Kinerja Algoritma Klasifikasi pada Imbalanced Dataset,” J. Teknoinfo, vol. 13, no. 1, 2019, doi: 10.33365/jti.v13i1.184.
S. E. Suryana, B. Warsito, and S. Suparti, “Penerapan Gradient Boosting dengan Hyperopt untuk Memprediksi Keberhasilan Telemarketing Bank,” J. Gaussian, vol. 10, no. 4, 2021, doi: 10.14710/j.gauss.v10i4.31335.
Ichwanul Muslim Karo Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” J. Softw. Eng. Inf. Commun. Technol., vol. 1, no. 1, 2020.
Copyright (c) 2024 Dian Kurniasari, Rekti Nurul Hidayah, Notiragayu Notiragayu, Warsono Warsono, Rizki Khoirun Nisa
This work is licensed under a Creative Commons Attribution 4.0 International License.