Text Extraction from Natural Scene Images using OpenCV and CNN

Full Text (PDF, 713KB), PP.48-54

Views: 0 Downloads: 0


Vaibhav Goel 1,* Vaibhav Kumar 1 Amandeep Singh Jaggi 1 Preeti Nagrath 1

1. Computer Science Department, Bharati Vidyapeeth’s College of Engineering, New Delhi, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2019.09.06

Received: 6 Jun. 2019 / Revised: 20 Jun. 2019 / Accepted: 24 Jun. 2019 / Published: 8 Sep. 2019

Index Terms

Text Extraction, Deep Learning, OpenCV, natural scene images, CNN, Optical Character Recognition


The influence of exponentially increasing camera-embedded smartphones all around the world has magnified the importance of computer vision tasks, and gives rise to a vast number of opportunities in the field. One of the major research areas in this field is the extraction of text embedded in natural scene images. Natural scene images are the images taken from a camera, where the background is random, and the variety of colors used in the image may be diverse. When text is present in such type of images, it is usually difficult for a machine to detect and extract this text due to a number of parameters. This paper presents a technique that uses a combination of the Open Source Computer Vision Library (OpenCV) and the Convolutional Neural Networks (CNN), to extract English text from images efficiently. The CNN model is based on a two-stage pipeline that uses a single neural network to directly detect the characters in the scene images. It eliminates the unnecessary intermediate steps that are present in the previous approaches to this task making them slower and inaccurate, thereby improving the time complexity and the performance of the algorithm.

Cite This Paper

Vaibhav Goel, Vaibhav Kumar, Amandeep Singh Jaggi, Preeti Nagrath, "Text Extraction from Natural Scene Images using OpenCV and CNN", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.9, pp.48-54, 2019. DOI:10.5815/ijitcs.2019.09.06


[1]B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform”, Proc. CVPR, 2010.

[2]H. Raj, R. Ghosh, “Devanagari text extraction from natural scene images”, IEEE 2014.

[3]T. Kumuda and L. Basavaraj, “Text extraction from natural scene images using region based methods-a survey”, Proc. of Int. Conf. on Recent Trends in Signal Processing, Image Processing and VLSI, ACEEE 2014.

[4]M. Prabaharan and K. Radha, “Text extraction from natural scene images and conversion to audio in smart phone applications”, IJIRCCE, 2015.

[5]C. Bartz, H. Yang, and C. Meinel, “STN-OCR: A single neural network for text detection and text recognition”, arXiv:1707.08831v1 [cs.CV] 27 Jul 2017.

[6]S. Mori, H. Nishida, and H. Yamada, “Book optical character recognition”, John Wiley & Sons, Inc. New York, NY, USA, 1999.

[7]F. Liu, X. Peng, T. Wang, and S. Lu, “A density-based approach for text extraction in images”, IEEE 2008.

[8]J. Kim, S. Park, and S. Kim, “Text locating from natural scene images using image intensities”, IEEE 2005.

[9]M. Busta, L. Neumann, and J. Matas, “Fastext: Efficient unconstrained scene text detector”, in Proc. of ICCV, 2015.

[10]Coates et al., “Text detection and character recognition in scene images with unsupervised feature learning”, Proc. ICDAR 2011, pp. 440–445.

[11]S. Lu, T. Chen, S. Tian, J. H. Lim, and C. L. Tan, “Scene text extraction based on edges and support vector regression”, IJDAR, 2015.

[12]W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced mser trees”, in Proc. of ECCV, 2014.

[13]N. Mishra, C. Patvardhan, Lakshimi, C. Vasantha, and S. Singh, “Shirorekha chopping integrated tesseract ocr engine for enhanced hindi language recognition”, International Journal of Computer Applications, Vol. 39, No. 6, February 2012.

[14]R. Smith, “An overview of the tesseract OCR engine”, in Proc. Int. Conf. Document Anal. Recognition, pp. 629–633 2007.

[15]L. Neumann and J. Matas, “Real-time scene text localization and recognition”, 25th IEEE Conference on Computer Vision and Pattern Recognition, 2012.

[16]X. Liu and J. Samarabandu, “Multiscale edge-based text extraction from complex images”, 2006 IEEE International Conference on Multimedia and Expo.

[17]X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: an efficient and accurate scene text detector”, arXiv:1704.03155v2 [cs.CV] 10 Jul 2017.

[18]Chee Kheng Ch’ng & Chee Seng Chan, “Total-Text: a comprehensive dataset for scene text detection and recognition”, 14th IAPR International Conference on Document Analysis and Recognition {ICDAR}, 2017, pp. 935-942.

[19]X. C. Yin, X. Yin, K. Huang, and H. W. Hao, “Robust text detection in natural scene images”, IEEE transactions on pattern analysis and machine intelligence, 0162-8828/13, 2013, IEEE.

[20]X. Liu, K. Lu, and W. Wang, “Effectively localize text in natural scene images”, 21st international conference on pattern recognition(ICPR), November 11-15, 2012, Tsukuba, Japan.

[21]Y. F. Pan, X. Hou, C. L. Liu, “A robust system to detect and localize texts in natural scene images”, unpublished.

[22]L. Neumann and J. Matas, “A method for text localization and recognition in real-world images”, in Proc. of ACCV, 2010.

[23]S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. L. Tan, “Text flow: A unified text detection system in natural scene images”, in Proc. of ICCV, 2015.

[24]M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks”, International Journal of Computer Vision, 116(1):1– 20, jan 2016.

[25]A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images”, arXiv preprint arXiv:1604.06646, 2016.

[26]Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, “Multi-oriented text detection with fully convolutional networks”, in Proc. of CVPR, 2015.