An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Technique

Smt. Swaroopa Shastri 1 Shashank Vishwakarma 1,*

1. Department of CSE (MCA), Visvesvaraya Technological University, Centre for PG Studies, Kalaburagi, India

Received: 11 Sep. 2022 / Revised: 16 Oct. 2022 / Accepted: 15 Nov. 2022 / Published: 8 Aug. 2023

Image processing MSER, OCR, Geometrical properties, SWT, TTS Synthesizer


This study explores the conversion of English to Hindi, first to text, and subsequently to speech. The first part of the implementation is the text recognition from images, in which two approaches are used for text character recognition: a maximally stable extensible region (MSER) and grayscale conversion the second part of the paper deals with the geometric filtering in combination with stroke width transform (SWT). Subsequently, letter/alphabets are grouped to detect text sequences, which are then fragmented into words. Finally, a 96 percent accurate spell check is performed using naive Bayes and decision tree algorithms, followed by the use of optical character recognition (OCR) to digitize. The word Give our text-to-speech synthesizer (TTS) the recognized text to convert it to Hindi language using the text-to-speech model. Based on aspects such speech speed, sound quality, pronunciation, and clarity.

Swaroopa Shastri, Shashank Vishwakarma, "An Efficient Approach for Text-to-Speech Conversion Using Machine Learning and Image Processing Technique", International Journal of Engineering and Manufacturing (IJEM), Vol.13, No.4, pp. 44-49, 2023. DOI:10.5815/ijem.2023.04.05


