WEB-BASED IMAGE CAPTIONING FOR IMAGES OF TOURIST ATTRACTIONS IN PURBALINGGA USING TRANSFORMER ARCHITECTURE AND TEXT-TO-SPEECH

  • Safa Muazam Informatics, Engineering Faculty, Universitas Jenderal Soedirman, Indonesia
  • Yogiek Indra Kurniawan Informatics, Engineering Faculty, Universitas Jenderal Soedirman, Indonesia
  • Dadang Iskandar Informatics, Engineering Faculty, Universitas Jenderal Soedirman, Indonesia
Keywords: image captioning, text-to-speech, tourist attractions, transformer

Abstract

Purbalingga is a region located in Central Java Province, offering interesting natural beauty and tourist destinations. Many tourists capture their moments in photos, which are then uploaded to social media. However, a picture can contain a lot of information, and each individual may interpret it differently. Without captions, people may struggle to extract this information. Image captioning addresses this challenge by automatically generating text descriptions for images. Additionally, text-to-speech is used to enhance accessibility for the visually impaired in understanding image descriptions. This research aims to develop an image captioning model for images of tourist attractions in Purbalingga using transformer architecture and ResNet50. The transformer architecture employs an attention mechanism to learn the context and relationships between inputs and outputs, while ResNet50 is a robust convolutional network for image feature extraction. Model evaluation using BLEU metrics, which compare generated sentences to reference sentences, shows the best results as BLEU-{1, 2, 3, 4} = {0.672, 0.559, 0.489, 0.437}. Experiments indicate that increasing embeddings and layers extends training time and lowers BLEU scores, while changing the number of heads has minimal impact on results. The best model is implemented in a web-based application using the SDLC waterfall method, Flask framework, and MySQL database. This application allows users to upload tourist attraction images, receive automatic descriptions in Indonesian, and listen to the captions read aloud using the Web Speech API-based text-to-speech feature. Blackbox testing results show valid outcomes for all tests, indicating that the application operates as required and is suitable for use.

Downloads

Download data is not yet available.

References

M. D. Zakir Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga, “A comprehensive survey of deep learning for image captioning,” ACM Comput. Surv., vol. 51, no. 6, 2019, doi: 10.1145/3295748.

D. H. Fudholi, “Image Captioning Approach for Household Environment Visual Understanding,” Int. J. Inf. Syst. Technol., vol. 5, no. 36, pp. 292–298, 2021, [Online]. Available: https://ijistech.org/ijistech/index.php/ijistech/article/view/135/pdf

S. S. Rawat, K. S. Rawat, and R. Nijhawan, “A Novel Convolutional Neural Network-Gated Recurrent Unit approach for Image Captioning,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 2020, pp. 704–708. doi: 10.1109/ICSSIT48917.2020.9214109.

S. Degadwala, D. Vyas, H. Biswas, U. Chakraborty, and S. Saha, “Image Captioning Using Inception V3 Transfer Learning Model,” in 2021 6th International Conference on Communication and Electronics Systems (ICCES), 2021, pp. 1103–1108. doi: 10.1109/ICCES51350.2021.9489111.

Y. Chu, X. Yue, L. Yu, M. Sergei, and Z. Wang, “Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention,” Wirel. Commun. Mob. Comput., vol. 2020, 2020, doi: 10.1155/2020/8909458.

R. Khan, M. S. Islam, K. Kanwal, M. Iqbal, M. I. Hossain, and Z. Ye, “A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism,” no. i, 2022, [Online]. Available: http://arxiv.org/abs/2203.01594

P. Dandwate, C. Shahane, V. Jagtap, and S. C. Karande, “Comparative Study of Transformer and LSTM Network with Attention Mechanism on Image Captioning,” Lect. Notes Networks Syst., vol. 720 LNNS, pp. 527–539, 2023, doi: 10.1007/978-981-99-3761-5_47.

R. Mulyawan, A. Sunyoto, and A. H. Muhammad, “Pre-Trained CNN Architecture Analysis for Transformer-Based Indonesian Image Caption Generation Model,” Int. J. Informatics Vis., vol. 7, no. 2, pp. 487–493, 2023, doi: 10.30630/joiv.7.2.1387.

M. S. Mohammad Suryawinata, Buku Ajar Mata Kuliah Pengembangan Aplikasi Berbasis Web. 2019. doi: 10.21070/2019/978-602-5914-81-2.

M. E. Prastyo, “Aplikasi Text to Speech Berbasis Javascript,” Visualika, vol. 7, no. 1, pp. 89–101, 2022, [Online]. Available: http://www.jurnas.stmikmj.ac.id/index.php/visualika/article/view/147/72

E. Etriyanti, D. Syamsuar, and N. Kunang, “Implementasi Data Mining Menggunakan Algoritme Naive Bayes Classifier dan C4.5 untuk Memprediksi Kelulusan Mahasiswa,” Telematika, vol. 13, no. 1, pp. 56–67, 2020, doi: 10.35671/telematika.v13i1.881.

Adawiyah Ritonga and Yahfizham Yahfizham, “Studi Literatur Perbandingan Bahasa Pemrograman C++ dan Bahasa Pemrograman Python pada Algoritma Pemrograman,” J. Tek. Inform. dan Teknol. Inf., vol. 3, no. 3, pp. 56–63, 2023, doi: 10.55606/jutiti.v3i3.2863.

R. Y. Azhari, “Technology and Informatics Insight Journal Web Service Framework: flask dan fastAPI,” Technol. Informatics Insight J., vol. 1, no. 1, pp. 80–87, 2020, [Online]. Available: https://jurnal.universitasputrabangsa.ac.id/index.php/tiij

N. Khaerunnisa and N. Nofiyati, “Sistem Informasi Pelayanan Administrasi Kependudukan Berbasis Web Studi Kasus Desa Sidakangen Purbalingga,” J. Tek. Inform., vol. 1, no. 1, pp. 25–33, 2020, doi: 10.20884/1.jutif.2020.1.1.9.

S. Julianto and S. Setiawan, “Perancangan Sistem Informasi Pemesanan Tiket Bus Pada Po. Handoyo Berbasis Online,” Simatupang, Julianto Sianturi, Setiawan, vol. 3, no. 2, pp. 11–25, 2019, [Online]. Available: https://journal.amikmahaputra.ac.id/index.php/JIT/article/view/56/48

F. N. Hasanah and R. S. Untari, Buku Ajar Rekayasa Perangkat Lunak. UMSIDA PRESS, 2020. doi: 10.21070/2020/978-623-6833-89-6.

Y. I. Kurniawan, A. Fatikasari, M. L. Hidayat, and M. Waluyo, “Prediction for Cooperative Credit Eligibility Using Data Mining Classification With C4.5 Algorithm,” J. Tek. Inform., vol. 2, no. 2, pp. 67–74, 2021, doi: 10.20884/1.jutif.2021.2.2.49.

F. Y. Al Irsyadi, Supriyadi, and Y. I. Kurniawan, “Interactive educational animal identification game for primary schoolchildren with intellectual disability,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 6, pp. 3058–3064, 2019, doi: 10.30534/ijatcse/2019/64862019.

O. Ondeng, H. Ouma, and P. Akuon, “A Review of Transformer-Based Approaches for Image Captioning,” Appl. Sci., vol. 13, no. 19, pp. 1–38, 2023, doi: 10.3390/app131911103.

Published
2024-10-29
How to Cite
[1]
S. Muazam, Y. I. Kurniawan, and D. Iskandar, “WEB-BASED IMAGE CAPTIONING FOR IMAGES OF TOURIST ATTRACTIONS IN PURBALINGGA USING TRANSFORMER ARCHITECTURE AND TEXT-TO-SPEECH”, J. Tek. Inform. (JUTIF), vol. 5, no. 5, pp. 1460-1478, Oct. 2024.