A BiLSTM-Based Approach For Speech Emotion Recognition In Conversational Indonesian Audio using SMOTE

Nariswari Nur Shabrina; Fatan  Kasyidi; Ridwan Ilyas

doi:10.52436/1.jutif.2025.6.5.5183

Authors

Nariswari Nur Shabrina Computer Science, Universias Jenderal Achmad Yani, Indonesia
Fatan Kasyidi Computer Science, Universias Jenderal Achmad Yani, Indonesia
Ridwan Ilyas Computer Science, Universias Jenderal Achmad Yani, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.5183

Keywords:

BiLSTM, Bootstrap Aggregating, Nadam, Nyquist Shannon, One-vs-All, SMOTE, Speech Emotion Recognition

Abstract

Speech Emotion Recognition (SER) identifies human emotions through voice signal analysis, focusing on pitch, intonation, and tempo. This study determines the optimal sampling rate of 48,000 Hz, following the Nyquist-Shannon theorem, ensuring accurate signal reconstruction. Audio features are extracted using Mel-Frequency Cepstral Coefficients (MFCC) to capture frequency and rhythm changes in temporal signals. To address data imbalance, Synthetic Minority Over-sampling Technique (SMOTE) generates synthetic data for the minority class, enabling more balanced model training. A One-vs-All (OvA) approach is applied in emotion classification, constructing separate models for each emotion to enhance detection. The model is trained using Bidirectional Long Short-Term Memory (BiLSTM), capturing contextual information from both directions, improving understanding of complex speech patterns. To optimize the model, Nadam (Nesterov-accelerated Adaptive Moment Estimation) is used to accelerate convergence and stabilize weight updates. Bagging (Bootstrap Aggregating) techniques are implemented to reduce overfitting and improve prediction accuracy. The results show that this combination of techniques achieves 78% accuracy in classifying voice emotions, contributing significantly to improving emotion detection systems, especially for under-resourced languages.

Downloads

Download data is not yet available.

References

Nelly Elsayed, Zag ElSayed, Navid Asadizanjani, Murat Ozer, Ahmed Abdelgawad, and Magdy Bayoumi, “Speech Emotion Recognition using Supervised Deep Recurrent System for Mental Health Monitoring,” Jun. 2023.

Jegadeesan S, Aswin Kumar S, Madhan K, Karthick G, S Gowdhamkumar, and G Anushree, “Real Time Speech Emotion Recognition for Mental Health Monitoring,” Apr. 2025.

L. Yu, F. Xu, Y. Qu, and K. Zhou, “Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion,” Applied Acoustics, vol. 216, Jan. 2024, doi: 10.1016/j.apacoust.2023.109752.

M. V. Subbarao, S. K. Terlapu, and P. S. R. Chowdary, “Emotion Recognition using BiLSTM Classifier,” in Proceedings - 2022 International Conference on Computing, Communication and Power Technology, IC3P 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 195–198. doi: 10.1109/IC3P52835.2022.00048.

Z. Zeng, J. Liu, and Y. Yuan, “A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator,” IEEE Transactions on Signal Processing, vol. 72, pp. 3595–3610, 2024, doi: 10.1109/TSP.2024.3436610.

D. B. Riyanto, A. Y. Rahman, and Istiadi, “Children with Speech Disorders Voice Classification: LSTM and BiLSTM Approach Based on MFCC Features,” in Proceedings: ICMERALDA 2023 - International Conference on Modeling and E-Information Research, Artificial Learning and Digital Applications, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 35–38. doi: 10.1109/ICMERALDA60125.2023.10458192.

L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The choice of scaling technique matters for classification performance,” Dec. 2022, doi: 10.1016/j.asoc.2022.109924.

A. N. I. Adma and D. P. Lestari, “Conversational Speech Emotion Recognition From Indonesian Spoken Language Using Recurrent Neural Network-Based Model,” in Proceedings - 2021 8th International Conference on Advanced Informatics: Concepts, Theory, and Application, ICAICTA 2021, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ICAICTA53211.2021.9640273.

F. KASYIDI, R. ILYAS, and N. M. ANNISA, “Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia,” MIND Journal, vol. 6, no. 2, pp. 194–204, Dec. 2021, doi: 10.26760/mindjournal.v6i2.194-204.

X. J. Meng, L. X. Zhang, Z. M. Liu, Y. Pan, and S. T. Zhu, “Hybrid sampling method for structural reliability analysis,” in Proceedings - 2020 International Conference on Artificial Intelligence and Computer Engineering, ICAICE 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020, pp. 408–411. doi: 10.1109/ICAICE51518.2020.00086.

E. Utami, Rini, A. F. Iskandar, and S. Raharjo, “Multi-Label Classification of Indonesian Hate Speech Detection Using One-vs-All Method,” in Proceedings - 2021 IEEE 5th International Conference on Information Technology, Information Systems and Electrical Engineering: Applying Data Science and Artificial Intelligence Technologies for Global Challenges During Pandemic Era, ICITISEE 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 78–82. doi: 10.1109/ICITISEE53823.2021.9655883.

H. Avula, R. Ranjith, and A. S. Pillai, “CNN based Recognition of Emotion and Speech from Gestures and Facial Expressions,” in 6th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 1360–1365. doi: 10.1109/ICECA55336.2022.10009316.

M. Subramanian, S. Lakshmi Swetha, and V. R. Rajalakshmi, “Deep Learning Approaches for Melody Generation: An Evaluation Using LSTM, BILSTM and GRU Models,” in 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICCCNT56998.2023.10308344.

K. Chakrabarti and N. Chopra, “A State-Space Perspective on the Expedited Gradient Methods: Nadam, RAdam, and Rescaled Gradient Flow,” in 2022 8th Indian Control Conference, ICC 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 31–36. doi: 10.1109/ICC56513.2022.10093397.

Y. Shi, Z. C. Lin, J. Chen, X. Kang, Q. Yan, and C. Wei, “Research on Vibration Event Classification in Φ- OTDR Systems Using MFCC Feature Extraction and Improved Swin Transformer,” in 2024 22nd International Conference on Optical Communications and Networks, ICOCN 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/ICOCN63276.2024.10648329.

S. Zhao, Y. Zhang, N. Xia, K. Zhang, J. Kuai, and Y. Zhang, “Research on Electricity Price Prediction Based on Combination Model,” in 2024 3rd International Conference on Energy and Electrical Power Systems, ICEEPS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 753–758. doi: 10.1109/ICEEPS62542.2024.10693046.

H. Kadi, T. Sourget, M. Kawczynski, S. Bendjama, B. Grollemund, and A. Bloch-Zupan, “Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques,” in Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 1358–1363. doi: 10.1109/CSCI62032.2023.00298.

S. A. Rufus, N. A. Ahmad, Z. Abdul-Malek, and N. Abdullah, “Thunderstorm Prediction Model Using SMOTE Sampling and Machine Learning Approach,” in APL 2023 - 12th Asia-Pacific International Conference on Lightning, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/APL57308.2023.10182046.

B. Xu, W. Wang, R. Yang, and Q. Han, “An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach,” in 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021, Institute of Electrical and Electronics Engineers Inc., Jul. 2021, pp. 125–129. doi: 10.1109/BDAI52447.2021.9515306.

A. S. Palli, J. Jaafar, M. A. Hashmani, H. M. Gomes, and A. R. Gilal, “A Hybrid Sampling Approach for Imbalanced Binary and Multi-Class Data Using Clustering Analysis,” IEEE Access, vol. 10, pp. 118639–118653, 2022, doi: 10.1109/ACCESS.2022.3218463.

T. Miyata, D. Kanemoto, and T. Hirose, “Random Undersampling Wireless EEG Measurement Device using a Small TEG,” in Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ISCAS46773.2023.10181822.

H. Cui, L. Zhang, W. Wu, and Y. Peng, “A two-layer BiLSTM model with linear gating for Chinese named entity recognition,” in Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/IJCNN54540.2023.10191631.

M. Doostparast, M. Pouyani, and M. H. Y. Moghaddam, “Bootstrap Aggregating as an ensembled machine learning algorithm for power consumption prediction under asymmetric loss with linear model-base learners,” in 2023 27th International Electrical Power Distribution Conference, EPDC 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 6–11. doi: 10.1109/EPDC59105.2023.10218875.

H. Hairani, T. Widiyaningtyas, D. D. Prasetya, I. Saifudin, and A. Tholib, “Reducing Class Imbalance with Undersampling for Improvement of Classification Method in Liver Disease Classification,” in 2024 Beyond Technology Summit on Informatics International Conference, BTS-I2C 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 171–175. doi: 10.1109/BTS-I2C63534.2024.10941826.

M. M. Santoni, T. Basaruddin, K. Junus, and O. Lawanto, “Automatic Detection of Students’ Engagement During Online Learning: A Bagging Ensemble Deep Learning Approach,” IEEE Access, vol. 12, pp. 96063–96073, 2024, doi: 10.1109/ACCESS.2024.3425820.