Optimized RoBERTa–DeBERTa Ensemble for Multi-Class Sentiment Analysis on Highly Imbalanced Data

Xaverius  Sika; Desi  Kisbianty; Marrylinteri  Istoningtyas; Dodo Zaenal  Abidin; Afrizal Nehemia  Toscany

doi:10.52436/1.jutif.2026.7.2.5350

Authors

Xaverius Sika Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Desi Kisbianty Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Marrylinteri Istoningtyas Informatics Engineering, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Dodo Zaenal Abidin Magister of Information System, Faculty of Computer Science, Universitas Dinamika Bangsa, Jambi, Indonesia
Afrizal Nehemia Toscany Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5350

Keywords:

Amazon Reviews, Ensemble Learning, Neutral Class Detection, Sentiment Analysis, Transformer Models

Abstract

Multi-class sentiment analysis on highly imbalanced datasets poses substantial challenges for achieving accurate and equitable classification, particularly when neutral sentiments are considerably underrepresented. This study evaluates four fine-tuned transformer models—Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, RoBERTa, and DeBERTa—using a real-world Amazon review dataset comprising over 20,000 user-generated texts. Sentiment labels were derived from star ratings through a standardized mapping scheme. Experimental results show that while BERT achieved the highest overall accuracy (93%), its performance on the minority Neutral class remained limited (F1-score: 0.36). DeBERTa improved Neutral recall to 0.59 but with a slightly lower overall accuracy of 91%. To address this imbalance, two ensemble strategies were explored: a fixed-weight soft voting scheme and an optimized-weight ensemble combining RoBERTa and DeBERTa. The optimized RoBERTa–DeBERTa ensemble yielded the most balanced performance, achieving a Neutral-class F1-score of 0.57 while maintaining 91% overall accuracy. ROC and PR curve analyses further indicate superior sensitivity–precision balance for this optimized ensemble. The findings indicate that adaptive ensemble weighting can substantially enhance minority-class detection under severe imbalance. This study provides a clear methodological contribution by demonstrating the effectiveness of targeted ensemble optimization and offers practical guidance for developing more balanced and reliable sentiment classification systems.

Downloads

Download data is not yet available.

References

M. Kumar, L. Khan, and H.-T. Chang, “Evolving techniques in sentiment analysis: a comprehensive review,” PeerJ Comput. Sci., vol. 11, p. e2592, Jan. 2025, doi: 10.7717/peerj-cs.2592.

S. J and K. U, “Sentiment analysis of amazon user reviews using a hybrid approach,” Meas. Sens., vol. 27, p. 100790, Jun. 2023, doi: 10.1016/j.measen.2023.100790.

K. Ghosh, C. Bellinger, R. Corizzo, P. Branco, B. Krawczyk, and N. Japkowicz, “The class imbalance problem in deep learning,” Mach. Learn., vol. 113, no. 7, pp. 4845–4901, Jul. 2024, doi: 10.1007/s10994-022-06268-8.

N. Al Hafidh and A. Al-Karawi, “Advanced Sentiment Analysis of Amazon Electronics Reviews Leveraging BERT: Model Optimization and Evaluation,” Procedia Comput. Sci., vol. 258, pp. 3608–3618, 2025, doi: 10.1016/j.procs.2025.04.616.

X. Zhang, F. Guo, T. Chen, L. Pan, G. Beliakov, and J. Wu, “A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research,” J. Theor. Appl. Electron. Commer. Res., vol. 18, no. 4, pp. 2188–2216, Dec. 2023, doi: 10.3390/jtaer18040110.

H. Ali, E. Hashmi, S. Yayilgan Yildirim, and S. Shaikh, “Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based Techniques,” Electronics, vol. 13, no. 7, p. 1305, Mar. 2024, doi: 10.3390/electronics13071305.

S. Tabinda Kokab, S. Asghar, and S. Naz, “Transformer-based deep learning models for the sentiment analysis of social media data,” Array, vol. 14, p. 100157, Jul. 2022, doi: 10.1016/j.array.2022.100157.

K. R. Narejo et al., “EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment Classification,” IEEE Access, vol. 12, pp. 131954–131967, 2024, doi: 10.1109/ACCESS.2024.3456039.

Y. Wu, Z. Jin, C. Shi, P. Liang, and T. Zhan, “Research on the Application of Deep Learning-based BERT Model in Sentiment Analysis”.

S. Iftikhar, B. Alluhaybi, M. Suliman, A. Saeed, and K. Fatima, “Amazon products reviews classification based on machine learning, deep learning methods and BERT,” TELKOMNIKA Telecommun. Comput. Electron. Control, vol. 21, no. 5, p. 1084, Oct. 2023, doi: 10.12928/telkomnika.v21i5.24046.

K. Anusuya, “Optimizing Multi-Class Text Classification: A Diverse Stacking Ensemble Framework Utilizing Transformers,” Aug. 13, 2023, arXiv: arXiv:2308.06804. doi: 10.48550/arXiv.2308.06804.

M. F. Almufareh, N. Jhanjhi, N. A. Khan, S. N. Almuayqil, M. Humayun, and D. Javed, “BertSent: Transformer-Based Model for Sentiment Analysis of Penta-Class Tweet Classification,” IEEE Access, vol. 12, pp. 196803–196817, 2024, doi: 10.1109/ACCESS.2024.3515836.

B. Ogunleye, H. Sharma, and O. Shobayo, “Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection,” Big Data Cogn. Comput., vol. 8, no. 9, p. 112, Sep. 2024, doi: 10.3390/bdcc8090112.

N. Al Hafidh and A. Al-Karawi, “Advanced Sentiment Analysis of Amazon Electronics Reviews Leveraging BERT: Model Optimization and Evaluation,” Procedia Comput. Sci., vol. 258, pp. 3608–3618, 2025, doi: 10.1016/j.procs.2025.04.616.

R. Qasim, W. H. Bangyal, M. A. Alqarni, and A. Ali Almazroi, “A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification,” J. Healthc. Eng., vol. 2022, pp. 1–17, Jan. 2022, doi: 10.1155/2022/3498123.

M. Bilal and A. A. Almazroi, “Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews,” Electron. Commer. Res., vol. 23, no. 4, pp. 2737–2757, Dec. 2023, doi: 10.1007/s10660-022-09560-w.

K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” IEEE Access, vol. 10, pp. 103694–103704, 2022, doi: 10.1109/ACCESS.2022.3210182.

K. R. Narejo et al., “EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment Classification,” IEEE Access, vol. 12, pp. 131954–131967, 2024, doi: 10.1109/ACCESS.2024.3456039.

M. U. Salur and İ. Aydın, “A soft voting ensemble learning-based approach for multimodal sentiment analysis,” Neural Comput. Appl., vol. 34, no. 21, pp. 18391–18406, Nov. 2022, doi: 10.1007/s00521-022-07451-7.

K. Kyritsis, C. M. Liapis, I. Perikos, M. Paraskevas, and V. Kapoulas, “From Transformers to Voting Ensembles for Interpretable Sentiment Classification: A Comprehensive Comparison,” Computers, vol. 14, no. 5, p. 167, Apr. 2025, doi: 10.3390/computers14050167.

H. Zou and Z. Wang, “A semi-supervised short text sentiment classification method based on improved Bert model from unlabelled data,” J. Big Data, vol. 10, no. 1, p. 35, Mar. 2023, doi: 10.1186/s40537-023-00710-x.

S. Biswas, K. Young, and J. Griffith, “A Comparison of Automatic Labelling Approaches for Sentiment Analysis,” in Proceedings of the 11th International Conference on Data Science, Technology and Applications, 2022, pp. 312–319. doi: 10.5220/0011265900003269.

P. D. Moral, S. Nowaczyk, and S. Pashami, “Why Is Multiclass Classification Hard?,” IEEE Access, vol. 10, pp. 80448–80462, 2022, doi: 10.1109/ACCESS.2022.3192514.

A. Rahali and M. A. Akhloufi, “End-to-End Transformer-Based Models in Textual-Based NLP,” AI, vol. 4, no. 1, pp. 54–110, Jan. 2023, doi: 10.3390/ai4010004.

D. Z. Abidin, M. Rosario, and A. Sadikin, “Improving Term Deposit Customer Prediction Using Support Vector Machine with SMOTE and Hyperparameter Tuning in Bank Marketing Campaigns,” vol. 6, no. 3, 2025, doi: doi.org/10.52436/1.jutif.2025.6.3.4585.

D. Z. Abidin, A. Siswanto, C. Saputra, B. Betantiyo, and A. Nehemia Toscany, “Enhancing Fake News Detection on Imbalanced Data Using Resampling Techniques and Classical Machine Learning Models,” J. Tek. Inform. Jutif, vol. 6, no. 5, pp. 3769–3786, Oct. 2025, doi: 10.52436/1.jutif.2025.6.5.5177.

V. K. Agbesi et al., “Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language,” Systems, vol. 12, no. 1, p. 1, Dec. 2023, doi: 10.3390/systems12010001.

N. J. Prottasha et al., “Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning,” Sensors, vol. 22, no. 11, p. 4157, May 2022, doi: 10.3390/s22114157.

M. K. Shaik Vadla, M. A. Suresh, and V. K. Viswanathan, “Enhancing Product Design through AI-Driven Sentiment Analysis of Amazon Reviews Using BERT,” Algorithms, vol. 17, no. 2, p. 59, Jan. 2024, doi: 10.3390/a17020059.

N. Ding et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nat. Mach. Intell., vol. 5, no. 3, pp. 220–235, Mar. 2023, doi: 10.1038/s42256-023-00626-4.

M. M. Krell, M. Kosec, S. P. Perez, and A. Fitzgibbon, “Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance,” Oct. 05, 2022, arXiv: arXiv:2107.02027. doi: 10.48550/arXiv.2107.02027.

S. Ramakrishnan and L. D. Dhinesh Babu, “Improving Multi-Label Emotion Classification on Imbalanced Social Media Data With BERT and Clipped Asymmetric Loss,” IEEE Access, vol. 13, pp. 60589–60601, 2025, doi: 10.1109/ACCESS.2025.3557091.

M. Rehan, M. S. I. Malik, and M. M. Jamjoom, “Fine-Tuning Transformer Models Using Transfer Learning for Multilingual Threatening Text Identification,” IEEE Access, vol. 11, pp. 106503–106515, 2023, doi: 10.1109/ACCESS.2023.3320062.

R. Pan, J. A. García-Díaz, F. Garcia-Sanchez, and R. Valencia-García, “Evaluation of transformer models for financial targeted sentiment analysis in Spanish,” PeerJ Comput. Sci., vol. 9, p. e1377, May 2023, doi: 10.7717/peerj-cs.1377.

M. A. Shah, M. J. Iqbal, N. Noreen, and I. Ahmed, “An Automated Text Document Classification Framework using BERT,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 3, 2023, doi: 10.14569/IJACSA.2023.0140332.

T. Ahmed, S. Ivan, M. Kabir, H. Mahmud, and K. Hasan, “Performance analysis of transformer-based architectures and their ensembles to detect trait-based cyberbullying,” Soc. Netw. Anal. Min., vol. 12, no. 1, p. 99, Dec. 2022, doi: 10.1007/s13278-022-00934-4.

Y. Cao, Z. Sun, L. Li, and W. Mo, “A Study of Sentiment Analysis Algorithms for Agricultural Product Reviews Based on Improved BERT Model,” Symmetry, vol. 14, no. 8, p. 1604, Aug. 2022, doi: 10.3390/sym14081604.

Optimized RoBERTa–DeBERTa Ensemble for Multi-Class Sentiment Analysis on Highly Imbalanced Data

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

sidebar

Information