Work place: A. J. Institute of Engineering & Technology Research Center, Visvesvaraya Technological University, Belagavi, Karnataka, c
E-mail: sureshasss@gmail.com
Website: https://orcid.org/0000-0003-2578-0552
Research Interests:
Biography
Suresha D. obtained his B.E. and M.Tech degrees in Computer Science and Engineering from Kuvempu University, Shankaragatta, and Visvesvaraya Technological University (VTU), Belagavi, India, in 2001 and 2010, respectively. He earned his Ph.D. in Computer Science and Engineering from VTU, Belagavi in 2018. He is currently serving as a Professor and Head of the Department of Computer Science and Engineering at Srinivas Institute of Technology, Mangaluru, India. Dr. Suresha has published numerous peer-reviewed research papers in reputed national and international journals and conferences. His research areas encompass signature analysis and retrieval, clustering, biometrics, image processing, pattern recognition, and symbolic data analysis. He is a life member of several professional organizations, including the Computer Society of India and the Indian Society for Technical Education.
By Divya Suresha D.
DOI: https://doi.org/10.5815/ijigsp.2026.02.06, Pub. Date: 8 Apr. 2026
This research outlines a comprehensive dual-modality speech recognition system designed specifically to support hearing-impaired students in understanding spoken Kannada through synchronized processing of auditory signals and visual articulatory cues. The approach capitalizes on deep learning capabilities to improve performance to extract speech-related features from spectrograms and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip movement discriminative features via CNNs and Temporal Convolutional Networks (TCNs) for visual input. A hybrid architecture, KanAVNet (Kannada Audio-Visual Network), based on a CNN–BiLSTM framework is integrated with a Connectionist Temporal Classification (CTC) loss function to enable robust sequence-to-sequence mapping while addressing temporal alignment challenges in audio-visual speech recognition. The system is fitted on a custom-developed Kannada audiovisual dataset, addressing the scarcity of regional-language AVSR resources. Empirical evidence shows that the model performs with a high degree of accuracy of 93.2%, a Word Error Rate (WER) of 9.8%, and an F1 score of 91.2%, outperforming baseline unimodal and existing multimodal models. This research highlights the effectiveness of multimodal fusion strategies in noisy environments and showcases the potential of AI-driven tools in promoting accessible and inclusive education for students with auditory impairments.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals