Divya

Work place: A. J. Institute of Engineering & Technology Research Center, Visvesvaraya Technological University, Belagavi - 590018, Karnataka, India

E-mail: divyapajiet@gmail.com

Website: https://orcid.org/0009-0002-4305-1521

Research Interests:

Biography

Divya is currently serving as an Assistant Professor in the Department of Information Science and Engineering at A. J. Institute of Engineering and Technology (AJIET), Mangaluru, India. She holds a Bachelor's and Master’s degree in Computer Science and Engineering from Visvesvaraya Technological University (VTU), along with an MBA from Sikkim Manipal University. She is presently pursuing her Ph.D. at VTU, Belagavi, India. She has published multiple research papers in reputed national and international conferences and journals. Her areas of research interest include Image Processing, Computer Vision, and Artificial Intelligence. A life member of ISTE, she actively engages in professional growth through participation in various faculty development programs, workshops, and certification courses offered by platforms like NPTEL and Coursera.

Author Articles
KanAVNet: A CNN-BiLSTM-CTC-Based Audio-Visual Speech Recognition System for Kannada to Assist the Hearing Impaired

By Divya Suresha D

DOI: https://doi.org/10.5815/ijigsp.2026.02.06, Pub. Date: 8 Apr. 2026

This research outlines a comprehensive dual-modality speech recognition system designed specifically to support hearing-impaired students in understanding spoken Kannada through synchronized processing of auditory signals and visual articulatory cues. The approach capitalizes on deep learning capabilities to improve performance to extract speech-related features from spectrograms and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip movement discriminative features via CNNs and Temporal Convolutional Networks (TCNs) for visual input. A hybrid architecture, KanAVNet (Kannada Audio-Visual Network), based on a CNN–BiLSTM framework is integrated with a Connectionist Temporal Classification (CTC) loss function to enable robust sequence-to-sequence mapping while addressing temporal alignment challenges in audio-visual speech recognition. The system is fitted on a custom-developed Kannada audiovisual dataset, addressing the scarcity of regional-language AVSR resources. Empirical evidence shows that the model performs with a high degree of accuracy of 93.2%, a Word Error Rate (WER) of 9.8%, and an F1 score of 91.2%, outperforming baseline unimodal and existing multimodal models. This research highlights the effectiveness of multimodal fusion strategies in noisy environments and showcases the potential of AI-driven tools in promoting accessible and inclusive education for students with auditory impairments.

[...] Read more.
Other Articles