COMPARISON OF IMAGE SEGMENTATION METHOD IN IMAGE CHARACTER EXTRACTION PREPROCESSING USING OPTICAL CHARACTER RECOGINITON

  • Condro Wibawa Faculty of Computer Science, Universitas Gunadarma, Indonesia
  • Dessy Tri Anggraeni Faculty of Computer Science, Universitas Gunadarma, Indonesia
Keywords: Image Processing, Niblack, Anggraeni2, Otsu Thressholding, Sauvola

Abstract

Today, there are many documents in the form of digital images obtained from various sources which must be able to be processed by a computer automatically. One of the document image processing is text feature extraction using OCR (Optical Character Recognition) technology. However, in many cases OCR technology are unable to read text characters in digital images accurately. This could be due to several factor such as poor image quality or noise. In order to get accurate result, the image must be in a good quality, so that digital image need to be preprocessed. The image preprocessing method used in this study are Otsu Thressholding Binarization, Niblack, and Sauvola methods. While the OCR technology used to extract the character is Tesseract library in Python. The test results show that direct text extraction from the original image gives better results with a character match rate average of 77.27%. Meanwhile, the match rate using the Otsu Thressholding method was 70.27%, the Sauvola method was 69.67%, and the Niblack method was only 35.72%. However, in some cases in this research the Sauvola and Otsu methods give better results.

Downloads

Download data is not yet available.

References

R. Munir, "Pengolahan Citra Digital dengan Pendekatan Algoritmik," Bandung, Informatika, 2004.

D. Putra, "Pengolahan Citra Digital," Yogyakarta, Penerbit Andi, 2010.

G. Kumar and P. K. Bhatia, "A Detailed Review of Feature Extraction in Image Processing Systems.," in Fourth International Conference on Advanced Computing & Communication Technologies, Rohtak, India, 2020.

A. Rifiana , M. B. Achmad and T. Maulana, "Automated Extraction of Large Scale Scanned Document Images using Google Vision OCR in Apache Hadoop Environment," International Journal of Advanced Computer Science and Applications, vol. 9, no. 11, 2018.

T. Somashekar, "A Survey on Handwritten Character Recognition using Deep Learning Technique," Journal of University of Shanghai for Science and Technology, vol. 23, no. 6, 2021.

K. I. Mail and M. G. Suryanata, "Ekstraksi Karakter Citra Menggunakan Optical Character Recognition Untuk Pencetakan Nomor Kendaraan Pada Struk Parkir.," Jurnal Media Informatika Budidarma, vol. 4, no. 4, 2020.

M. D. Azis, S. A. Syakri and Z. K. Simbolon, "Rancang Bangun Aplikasi Perbaikan Citra Hasil Scan Dokumen Lama Dengan Metode Filtering," Jurnal Teknologi Rekayasa Informasi dan Komputer, vol. 1, no. 2, 2018.

D. Z. Putri, D. Puspitaningrum and Y. Setiawan, "Konversi Citra Kartu Nama ke Teks Menggunakan Teknik OCR dan Jaro-WInkler Distance," Jurnal TEKNOINFO, vol. 2, no. 1, 2018.

Z. Huang, K. Chen, X. B. He and S. S. Karatzas, "Competition on Scanned Receipt OCR and Information Extraction," in International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019.

H. T. Ha, Z. Nevˇeˇrilová and A. Horák, "Recognition of OCR Invoice Metadata Block Types," in International Conference on Text, Speech, and Dialogue, Springer, 2018.

D. Lin, F. Lin, F. Cai and D. Cao, "Chinese Character Captcha Recognition and Performance Estimation Via Deep Neural Network," Neurocomputing, vol. 288, 2018.

B. G. Weinstein, "A computer vision for animal ecology," Journal of Animal Ecology, vol. 87, no. 3, 2018.

K. A. Hamad and M. Kaya, "A Detailed Analysis of Optical Character Recognition Technology," International Journal of Applied Mathematics, Electronics and Computers, vol. 4, no. 1, 2016.

P. V. Anh, N. D. T. Khan and T. Manh , "Improved OCR Quality for Smart Scanned Document Management System, Journal of Science and Technique," Le Quy Don Technical University, vol. 210, no. 9, 2020.

M. Brisinello, R. Grbi, M. Pul and T. Anđeli, "Improving Optical Character Recognition Performance for Low Quality Images," in International Symposium ELMAR, IEEE, 2017.

D. T. Anggraeni, "Perbaikan Citra Dokumen Hasil Pindai Menggunakan Metode Simple, Adaptive-Gaussian, dan Otsu Binarization Thresholding," Jurnal Manajemen Sistem Informasi dan Teknologi, vol. 11, no. 2, pp. 71-77, 2021.

F. Kiki, Segmentasi Teks Naskah Kuno yang Lapuk Menggunakan Adaptive Local Thressholding, Surabaya: Departemen Teknik Komputer, Institut Teknologi Sepuluh November, 2018.

Nanonets, "How to OCR with Tesseract, OpenCV and Python," Nanonets, 2023. [Online]. Available: https://nanonets.com/blog/ocr-with-tesseract. [Accessed 2023].

B. Baso, D. Nababan, R. Risald and R. Y. Kolloh, "Segmentasi Citra Tenun Menggunakan Metode Otsu Thresholding dengan Median Filter," Jurnal Teknologi dan Ilmu Komputer Prima, vol. 5, no. 1, 2022.

D. R. Medinah and S. Sinurat, "Analisa dan Perbandingan Algoritma Otsu Thresholding dengan Algoritma Region Growing Pada Segmentasi Citra Digital," Journal of Computer System and Informatics (JoSYC), vol. 2, no. 1, pp. 9-16, 2020.

N. I. Santikasari, R. D. Atmaja and E. Susatio, "Analisis Dan Implementasi Metode Niblack Pada Sistem Pengenalan Identitas Berbasis Palm Vein," e-Proceeding of Engineering, vol. 3, no. 1, 2016.

M. Rofi'i and D. R. Ningtias, "Local Adaptive Thresholding Menggunakan Metode Sauvola sebagai Tahapan Pra Pengolahan pada Data Citra Isyarat EKG (Elektrokardiogram)," Jurnal Teori dan Aplikasi Fisika, vol. 10, no. 1, 2022.

M. R. Toha and A. Triayudi, "Penerapan Membaca Tulisan di dalam Gambar Menggunakan Metode OCR Berbasis Website pada e-KTP," Jurnal Sains dan Teknologi, vol. 11, no. 1, 2022.

S. S. Abdullah and F. D. Muhammad, "Penggunaan e-KTP untuk Registrasi Otomatis Memanfaatkan Sistem OCR Dengan Metode Template Matching Correlation," Media Jurnal Informatika, vol. 12, no. 2, 2020.

Y. Afifah, A. Sujono and C. H. Brilliant, "The Line Segmentation Algorithm of Indonesian Electronic Identity Card (e-KTP) for Data Digitization," in THE 5TH INTERNATIONAL CONFERENCE ON INDUSTRIAL, MECHANICAL, ELECTRICAL, AND CHEMICAL ENGINEERING, Surakarta, 2020.

T. I. Cahyani, M. Zakiyamani, R. Riana and Hardi, "Perbandingan Akurasi Pengenalan Karakter Plat Nomor Menggunakan Tesseract Dan Data Latih Emnist," Journal of Information Technology and Computer Science, vol. 5, no. 2, 2022.

Published
2023-06-26
How to Cite
[1]
C. Wibawa and D. T. Anggraeni, “COMPARISON OF IMAGE SEGMENTATION METHOD IN IMAGE CHARACTER EXTRACTION PREPROCESSING USING OPTICAL CHARACTER RECOGINITON”, J. Tek. Inform. (JUTIF), vol. 4, no. 3, pp. 583-589, Jun. 2023.