E-Chars74k: An Extended Scene Character Dataset with Augmentation Insights and Benchmarks

PDF (1785KB), PP.133-150

Views: 0 Downloads: 0

Author(s)

Payel Sengupta 1,* Tauseef Khan 2 Ayatullah Faruk Mollah 3

1. Department of Computer Science and Engineering, Brainware University, Barasat, West Bengal 700125, India

2. School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, 522237, India

3. Department of Computer Science and Engineering, Aliah University, Kolkata 700160, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2025.06.08

Received: 5 Dec. 2024 / Revised: 11 May 2025 / Accepted: 16 Oct. 2025 / Published: 8 Dec. 2025

Index Terms

Scene Character Recognition, Deep Learning, CNN, Augmentation, Chars74k Dataset

Abstract

Semantic understanding of camera-captured scene text images is an important problem in computer vision. Scene character recognition is the pivotal task in this problem, and deep learning is now-a-days the most prospective approach. However, limited sample-size of scene character datasets appear to be a major hindrance for training deep networks. In this paper, we present (i) various augmentation techniques for increasing the sample size of such datasets along with associated insights, (ii) an extended version of the popular Chars74k dataset (herein referred to as E-Chars74k), and (iii) the benchmark performance on the developed E-Chars74k dataset. Experiments on various sets of data such as digits, alphabets and their combination, belonging to the usual as well as wild scenarios, clearly reflect significant performance gain (20%-30% increase in scene character recognition accuracy). It is noteworthy to mention that in all these experiments, a deep convolutional neural network powered with two conv-pool pairs is trained with the uniform training test partition to foster comparison on equal bench.

Cite This Paper

Payel Sengupta, Tauseef Khan, Ayatullah Faruk Mollah, "E-Chars74k: An Extended Scene Character Dataset with Augmentation Insights and Benchmarks", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.6, pp. 133-150, 2025. DOI:10.5815/ijigsp.2025.06.08

Reference

[1]T. Khan, R. Sarkar and A. F. Mollah, “Deep learning approaches to scene text detection: a comprehensive review,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3239-3298, Springer, 2021.
[2]H. Lin, P. Yang and F. Zhang, “Review of Scene Text Detection and Recognition,” Archives of Computational Methods in Engineering, vol. 27, no. 2, pp. 433-454, Springer, 2019.
[3]P. Sengupta and A. F. Mollah, "Journey of scene text components recognition: Progress and open issues," Multimedia Tools and Applications, vol. 80, no.4, pp. 6079-6104, Springer, 2020.
[4]S. Saha, N. Chakraborty, S. Kundu, S. Paul, A. F. Mollah, S. Basu and R. Sarkar, “Multi-lingual scene text detection and language identification,” Pattern Recognition Letters, vol. 138, pp. 16-22, Elsevier, 2020.
[5]W. Liu, C. Chaofeng and K. Wong, "SAFE: Scale Aware Feature Encoder for Scene Text Recognition," In Proceedings of Asian Conference on Computer Vision, pp. 196-211, Springer, 2018.
[6]C. Kang, G. Kim and S. Yoo, “Detection and recognition of text embedded in online images via neural context models,” In Proceeding of Association for the Advancement of Artificial Intelligence, pp. 4103–4110, 2017.
[7]F. Zhan and S. Lu, “Esir: End-to-end scene text recognition via iterative image rectification,” In Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 2059-2068, IEEE, 2019.
[8]B. Shi, X. Bai and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, IEEE Transactions, 2016.
[9]H. Liu and B. Bir, "Pose-Guided R-CNN for Jersey Number Recognition in Sports," In Proceedings of Conference on Computer Vision and Pattern Recognition Workshops, pp. 7297-7306, IEEE, 2019.
[10]D. Karatzas, F. Shafait, S. Uchida and M. Iwamura, “ICDAR 2013 robust reading competition”, In Proceedings of 12th International Conference on Document Analysis and Recognition, pp. 1484-1493, IEEE, 2013.
[11]M. Iwamura, “Advances of Scene Text Datasets”, arXiv preprint arXiv:1812.05219, 2018.
[12]N. Nayef, Y. Patel, M. Busta, P. N. Chowdhury, D. Karatzas, W. Khlif, J. Matas, U. Pal, J. C. Burie, C. L. Liu and J. M. Ogier, “ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition--RRC-MLT-2019”, arXiv preprint arXiv:1907.00945, 2019.
[13]D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, and F. Shafait, “ICDAR 2015 competition on robust reading,” In Proceeding of 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156-1160, IEEE, 2015.
[14]S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong and R. Young, “ICDAR 2003 robust reading competitions”, In Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 682-687, IEEE, 2003.
[15]T. E. De Campos, B. R. Babu and M. Varma, "Character recognition in natural images," In Proceedings of International Conference on Computer Vision Theory and Application, vol. 2, no. 7, pp. 273-280, 2009.
[16] S. Long, X. He and C. Ya, “Scene Text Detection and Recognition: The Deep Learning Era,” International Journal of Computer Vision, vol. 129, no. 1, pp. 161-184, Springer, 2021.
[17]E. A. Enriquez, N. Gordillo, L.M Bergasa, E. Romera and C.G. Huélamo, “Convolutional neural network vs. traditional methods for offline recognition of handwritten digits,” In Workshop of Physical Agents, pp. 87-99, Springer, 2018.
[18]P. Sengupta and A.F. Mollah, “Dissected Scene Character Recognition using HOG Descriptors,” In Internet of Things and Its Applications, pp. 199-209, Springer, 2021.
[19]B. Chekol, N. Celebi and T. Tasci, “Segmented character recognition using curvature-based global image feature,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 27, no. 5, pp. 3804-3814, 2019.
[20]C. Y. Lee, A. Bhardwaj, W. Di, V. Jagadeesh and R. Piramuthu, “Region-based discriminative feature pooling for scene text recognition,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4050-4057, 2014.
[21]P. Sengupta and A. F. Mollah, “Scene Character Recognition with Morphological Filtering and HOG Features,” In Soft Computing Techniques and Applications, Springer Nature AISC, vol. 1248, pp. 1-9, Springer Nature, 2020.
[22]M. Ali and H. Foroosh, “A holistic method to recognize characters in natural scenes,” In Proceedings of International Conference on Computer Vision Theory and Applications, pp. 449-457, 2016.
[23]L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” In Proceedings of Asian Conference on Computer Vision, pp. 770-783, Springer, 2021.
[24]A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wuand, A. Y. Ng, “Text detection and character recognition in scene images with unsupervised feature learning,” In Proceedings of International Conference on Document Analysis and Recognition, pp. 440-445, IEEE, 2011.
[25]D. E. Arroyo-Pérez, O. I. Álvarez-Canchila, A. Patiño-Saucedo, H. R. González, A. Patiño-Vanegas, “Automatic recognition of Colombian car license plates using convolutional neural networks and Chars74k database”, Journal of Physics, vol. 1547, no. 1, pp. 12-24, 2020.
[26]A. A. Chandio, M. Asikuzzaman, and M. R. Pickering, “Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion”, IEEE Access, vol. 8, pp. 109054–109070, 2020.
[27]J. H. Lin, J. Lazarow, A. Yang, D. Hong, R. Gupta, and Z. Tu, “Local binary pattern networks,” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 825-834, IEEE, 2020.
[28]A. R. Abdali, R. F. Ghani, “Robust character recognition for optical and natural images using deep learning,” In Proceedings of Student Conference on Research and Development (SCOReD), pp. 152-156, IEEE, 2019.
[29]Z. Zhang, H. Wang, S. Liu, T. S. Durrani, “Bilateral convolutional activations encoded with Fisher vectors for scene character recognition,” IEICE Transactions on Information and Systems, vol. 10, no. 5, pp. 1453-1459, 2018.
[30]S. B. Driss, M. Soua, R. Kachouri, and M. Akil, “A comparison study between MLP and convolutional neural network models for character recognition,” In Proceedings of Real-Time Image and Video Processing, pp. 10223-10229, 2017.
[31]Z. Zhang, H. Wang, S. Liu, and B. Xiao, “Consecutive convolutional activations for scene character recognition,” IEEE Access, vol. 28, no. 3, pp. 35734-35742, 2018.
[32]D.N. How and K. S. Sahari, “Character recognition of Malaysian vehicle license plate with deep convolutional neural networks,” In Proceedings of IEEE International Symposium on Robotics and Intelligent Sensors (IRIS), pp. 1-5, 2016.
[33]A. Ray, S. Rajeswar, and S. Chaudhury, “Scene text analysis using deep belief networks,” In Proceedings of the Indian Conference on Computer Vision Graphics and Image Processing, pp. 1-8, 2014.
[34]Y. Zhang, W. Wang, and L. Wang, “Scene text recognition with deeper convolutional neural networks,” In Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 2384-2388, IEEE, 2015.
[35]M. Ali, and H. Foroosh, “Natural Scene Character Recognition without Dependency on Specific Features,” In Proceedings of Computer Vision Theory and Applications, pp. 368-376, 2015.
[36]Y. Li, S. Zhang, X. Zhou, and F. Ren, “Build a compact binary neural network through bit-level sensitivity and data pruning,” Neurocomputing, pp. 45-54, Elsevier, 2020.
[37]Y. Wang, C. Shi, B. Xiao, and C. Wang, “Learning spatially embedded discriminative part detectors for scene character recognition,” In Proceedings of 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 363-368, IEEE, 2017.
[38]A. Hernández-García, and P. König, “Further advantages of data augmentation on convolutional neural networks,” In Proceedings of International Conference on Artificial Neural Networks, pp. 95-103, Springer, 2018.
[39]E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, “Autoaugment: Learning augmentation policies from data,” arXiv preprint arXiv: 1805.09501, 2018.
[40]C. Shorten and T.M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1-48, Springer, 2019.
[41]J. M. B. Francisco, S. Fiammetta, M. J. Jose, U. Daniel, F. Leonardo, “Forward noise adjustment scheme for data augmentation,” arXiv preprints. 2018.
[42]D. Dua and T. E. Karra, “UCI machine learning repository,” Irvine, CA: The University of California, School of Information and Computer Science, 2017.
[43]K. Guoliang, D. Xuanyi, Z. Liang and Y. Yi, “PatchShuffle regularization,” arXiv preprint. 2017.
[44]H. Kaiming, Z. Xiangyu, R. Shaoqing, and S. Jian, “Deep residual learning for image recognition,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, IEEE, 2016.
[45]I. Hiroshi, “Data augmentation by pairing samples for images classification,” arXiv:1801.02929v2, 2018.
[46]V. Terrance and W. T. Graham, “Improved regularization of convolutional neural networks with cutout,” arXiv:1708.04552v2, 2017.
[47]A. Hernández-García and P. König, “Further advantages of data augmentation on convolutional neural networks,” In Proceedings of International Conference on Artificial Neural Networks, pp. 95-103, Springer, 2018.
[48]T. Khan and A. F. Mollah, “Component-level Script Classification Benchmark with CNN on AUTNT Dataset,” In Proceedings of International Conference on Frontiers in Computing and Systems, pp. 225-234, Springer, 2021.
[49]T. Khan and A. F. Mollah, “AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN,” Multimedia Tools and Applications, vol. 78, no. 22, pp. 32159-32186, Springer, 2019.
[50]P. Ahamed, S. Kundu, T. Khan, V. Bhateja, R. Sarkar and A. F. Mollah, “Handwritten Arabic numerals recognition using convolutional neural network,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, no. 11, pp. 5445-5457, Springer, 2020.
[51]J. H. Lin, J. Lazarow, A. Yang, D. Hong, R. Gupta, and Z. Tu, "Local binary pattern networks." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 825-834. IEEE, 2020.