Improving Imbalanced Data Classification Using Stacked Ensemble Learning with Naïve Bayes Variants and Random Forest

Helen  Sastypratiwi; Yulianti Yulianti; Hafiz  Muhardi

doi:10.52436/1.jutif.2026.7.2.5308

Authors

Helen Sastypratiwi Informatics, University of Tanjungpura, Indonesia
Yulianti Informatics, University of Tanjungpura, Indonesia
Hafiz Muhardi Computer System Engineering, University of Tanjungpura, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5308

Keywords:

Ensemble learning, Feature selection, Imbalanced data, Interpretability, Stacked classification

Abstract

Classification in imbalanced and heterogeneous datasets poses significant challenges in informatics, particularly in agricultural domains where minority classes are often underrepresented and feature redundancy affects model performance. This research aims to improve classification performance by developing a stacked ensemble learning framework that integrates probabilistic and tree-based learners to address class imbalance and enhance model interpretability. The framework combines Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes (MNB), and Random Forest (RF) as base learners with Logistic Regression as the meta-learner. Feature selection was performed using Chi-Square and ReliefF to identify the most relevant predictors, while SMOTE was applied to balance the dataset. Two ensemble configurations were evaluated: Ensemble A (GNB + MNB) and Ensemble B (GNB + RF). Experimental results demonstrate that Ensemble B achieved 97% accuracy and a macro F1-score of 0.97, with a 5.7% accuracy improvement over the best individual classifier and an 18% improvement in minority-class recall. The integration of probabilistic and tree-based models within a stacked architecture provides an interpretable and effective solution for data-driven decision systems in informatics, particularly valuable for domains requiring both high accuracy and model explainability in handling imbalanced datasets.

Downloads

Download data is not yet available.

References

D. R. Krisna Saputra, Y. V. Via, and A. N. Sihananto, “Deteksi Anomali Menggunakan Ensemble Learning Dan Random Oversampling Pada Penipuan Transaksi Keuangan,” Jurnal Informatika Dan Teknik Elektro Terapan, 2024, doi: 10.23960/jitet.v12i3.4910.

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting Methods for Multi-Class Imbalanced Data Classification: An Experimental Review,” J Big Data, 2020, doi: 10.1186/s40537-020-00349-y.

E. Y. Abbasi, Z. Deng, A. H. Magsi, Q. Ali, K. Kumar, and A. Zubedi, “Optimizing Skin Cancer Survival Prediction With Ensemble Techniques,” Bioengineering, 2023, doi: 10.3390/bioengineering11010043.

S. J. Ghorpade, R. S. Chaudhari, and S. S. Patil, “Enhancement of Imbalance Data Classification With Boosting Methods: An Experiment,” ECS Trans, 2022, doi: 10.1149/10701.15923ecst.

J. J. Rodríguez, L. I. Kuncheva, and C. J. Alonso, “Rotation Forest: A New Classifier Ensemble Method,” IEEE Trans Pattern Anal Mach Intell, 2006, doi: 10.1109/tpami.2006.211.

L. Rokach, “Ensemble-Based Classifiers,” Artif Intell Rev, 2009, doi: 10.1007/s10462-009-9124-7.

F. C. Arnel Ferano, A. Zahra, and G. P. Kusuma, “Stacking Ensemble Learning for Optical Music Recognition,” Bulletin of Electrical Engineering and Informatics, 2023, doi: 10.11591/eei.v12i5.5129.

A. Onan, “On the Performance of Ensemble Learning for Automated Diagnosis of Breast Cancer,” 2015, doi: 10.1007/978-3-319-18476-0_13.

X. Zhang, S. Chen, P. Zhang, C. Wang, Q. Wang, and X. Zhou, “Staging of Liver Fibrosis Based on Energy Valley Optimization Multiple Stacking (EVO-MS) Model,” Bioengineering, 2024, doi: 10.3390/bioengineering11050485.

A. G. Karegowda, M. A. Jayaram, and A. S. Manjunath, “Cascading K-Means With Ensemble Learning: Enhanced Categorization of Diabetic Data,” Journal of Intelligent Systems, 2012, doi: 10.1515/jisys-2012-0010.

H. Liu, L. Yu, and X. Wang, “Interpretable machine learning in agriculture: Advances and opportunities,” Comput Electron Agric, vol. 192, p. 106578, 2022, doi: 10.1016/j.compag.2021.106578.

A. Rahman and M. et al., “Explainable artificial intelligence for healthcare: A survey,” Artif Intell Rev, vol. 56, pp. 3509–3549, 2023, doi: 10.1007/s10462-022-10325-8.

A. Santoso and others, “Stacking ensemble for imbalanced classification: A case study,” Journal of Intelligent Systems, 2024.

R. Shirwaikar, “Stacking ensembles with SMOTE for multiclassification performance improvement,” Pattern Recognit Lett, 2024.

K. Rao, “Hybrid ensemble learning algorithms for robust classification,” Expert Syst Appl, vol. 169, p. 114312, 2021, doi: 10.1016/j.eswa.2020.114312.

Y. Wang and others, “Ensemble methods in healthcare: Predictive support systems,” BMC Med Inform Decis Mak, vol. 22, no. 1, p. 134, 2022, doi: 10.1186/s12911-022-01845-6.

A. Jain and R. Kashyap, “Sentiment analysis using stacked ensemble models,” Procedia Comput Sci, vol. 217, pp. 567–574, 2022, doi: 10.1016/j.procs.2022.12.176.

S. Ali and H. Abdullah, “Stacking and feature selection for detecting fake accounts,” IEEE Access, vol. 10, pp. 65123–65134, 2022, doi: 10.1109/ACCESS.2022.3186789.

P. Vermani and others, “Comparative analysis of stacking versus voting classifiers,” Inf Sci (N Y), vol. 627, pp. 55–68, 2023, doi: 10.1016/j.ins.2023.02.016.

L. Mu, “Advances in ensemble learning for classification accuracy,” Knowl Based Syst, 2025.

S. Ovie, G. U. Nnaji, P. O. Oviasogie, P. E. Osayande, and P. Irhemu, “Effects of Composted Oil Palm Bunch Wastes and Chemical Fertilizer on Growth of Oil Palm Seedling Under Water Stress Condition,” Agro-Science, 2015, doi: 10.4314/as.v12i1.3.

S. Sundram, S. Meon, I. A. Seman, and R. Othman, “Application of Arbuscular Mycorrhizal Fungi With Pseudomonas Aeruginosa UPMP3 Reduces the Development of Ganoderma Basal Stem Rot Disease in Oil Palm Seedlings,” Mycorrhiza, 2014, doi: 10.1007/s00572-014-0620-5.

R. W. Rees, J. Flood, Y. Hasan, U. Potter, and R. M. Cooper, “Basal Stem Rot of Oil Palm ( Elaeis Guineensis ); Mode of Root Infection and Lower Stem Invasion By Ganoderma Boninense,” Plant Pathol, 2009, doi: 10.1111/j.1365-3059.2009.02100.x.

H. Fang et al., “Interaction Between Contrasting Rice Genotypes and Soil Physical Conditions Induced by Hydraulic Stresses Typical of Alternate Wetting and Drying Irrigation of Soil,” Plant Soil, 2018, doi: 10.1007/s11104-018-3715-5.

Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, “The Effect of Feature Extraction and Data Sampling on Credit Card Fraud Detection,” J Big Data, 2023, doi: 10.1186/s40537-023-00684-w.

M. H. Azri, S. Ismail, and R. Abdullah, “An Endophytic Bacillus Strain Promotes Growth of Oil Palm Seedling by Fine Root Biofilm Formation,” Rhizosphere, 2018, doi: 10.1016/j.rhisph.2017.10.003.

S. Matharaarachchi, M. Domaratzki, and S. Muthukumarana, “Enhancing SMOTE for imbalanced data with abnormal minority instances,” Machine Learning with Applications, vol. 18, p. 100597, 2024, doi: https://doi.org/10.1016/j.mlwa.2024.100597.

C. Tang, “Review on Application of Chi-square Statistic in Text Classification in Recent Five Years,” Applied and Computational Engineering, vol. 97, pp. 115–118, 2024, doi: 10.54254/2755-2721/97/20241397.

H. Mamdouh Farghaly and T. Abd El-Hafeez, “A high-quality feature selection method based on frequent and correlated items for text classification,” Soft comput, vol. 27, no. 16, pp. 11259–11274, 2023, doi: 10.1007/s00500-023-08587-x.

R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, “Relief-based feature selection: Introduction and review,” J Biomed Inform, vol. 85, pp. 189–203, 2018, doi: 10.1016/j.jbi.2018.07.014.

Y. A. N. D. L. Y. A. N. D. S. Y. Bai Xiaotong AND Zheng, “Chain hybrid feature selection algorithm based on improved Grey Wolf Optimization algorithm,” PLoS One, vol. 19, no. 10, pp. 1–40, Jul. 2024, doi: 10.1371/journal.pone.0311602.

M. Abdel-salam, N. Kumar, and S. Mahajan, “A proposed framework for crop yield prediction using hybrid feature selection approach and optimized machine learning,” Neural Comput Appl, vol. 36, no. 33, pp. 20723–20750, 2024, doi: 10.1007/s00521-024-10226-x.

G. Singh and S. Sharma, “Enhancing precision agriculture through cloud based transformative crop recommendation model,” Sci Rep, vol. 15, no. 1, pp. 1–22, 2025, doi: 10.1038/s41598-025-93417-3.

A. Roman, M. M. Rahman, S. A. Haider, T. Akram, and S. R. Naqvi, “Integrating Feature Selection and Deep Learning: A Hybrid Approach for Smart Agriculture Applications,” Algorithms, vol. 18, no. 4, pp. 1–26, 2025, doi: 10.3390/a18040222.

D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992, doi: 10.1016/S0893-6080(05)80023-1.

Z.-H. Zhou, Ensemble methods - Zhou, vol. 2, no. Schapire 1990. 2007.

B. T. Pham et al., “A comparative study of kernel logistic regression, radial basis function classifier, multinomial naive bayes, and logistic model tree for flash flood susceptibility mapping,” Water (Switzerland), vol. 12, no. 1, 2020, doi: 10.3390/w12010239.

A. Sharma, A. Jain, P. Gupta, and V. Chowdary, “Machine Learning Applications for Precision Agriculture: A Comprehensive Review,” IEEE Access, vol. 9, pp. 4843–4873, 2021, doi: 10.1109/ACCESS.2020.3048415.

S. Džeroski and B. Ženko, “Is Combining Classifiers with Stacking Better than Selecting the Best One?,” Mach Learn, vol. 54, no. 3, pp. 255–273, 2004, doi: 10.1023/B:MACH.0000015881.36452.6e.

J. Sill, G. Takacs, L. Mackey, and D. Lin, “Feature-Weighted Linear Stacking,” Nov. 2009, [Online]. Available: http://arxiv.org/abs/0911.0460

D. H. Wolpert, “Stacked Generalization,” 1992.

Improving Imbalanced Data Classification Using Stacked Ensemble Learning with Naïve Bayes Variants and Random Forest

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Make a Submission

sidebar

Information