J. Bennilo Fernandes

Work place: Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh

E-mail: bennij05@gmail.com


Research Interests: Computer systems and computational processes, Computational Learning Theory, Pattern Recognition, Embedded System, Image Processing, Speech Recognition


J. Bennilo Fernandes received the Bachelor of Electronics and Communication Engg (ECE) degree from LIT, Anna University and Master of Technology in Embedded Systems (ES) degrees from HIT, Tamil Nadu, India. He is currently working as Assistant Professor in Dept of ECE, Koneru Lakshmaiah Education Foundation (K L University, Vijayawada), Andhra Pradesh. His research interest includes speech recognition, image processing, applications of machine learning and embedded systems. He has published 5 papers in International Journals and 3 papers in International / National Conferences. He has 6 years of teaching experience.

Author Articles
Enhanced Deep Hierarchal GRU & BILSTM using Data Augmentation and Spatial Features for Tamil Emotional Speech Recognition

By J. Bennilo Fernandes Kasiprasad Mannepalli

DOI: https://doi.org/10.5815/ijmecs.2022.03.03, Pub. Date: 8 Jun. 2022

The Recurrent Neural Network (RNN) is well suited for emotional speech recognition because its uses constantly time shifting property. Even though RNN gives better results GRU, LSTM and BILSTM solves the gradient problem and overfitting problem joins the path to reduces the efficiency. Hence in this paper five deep learning architecture is designed in order to overcome the major issues using data augmentation and spatial feature. Five different architectures like: Enhanced Deep Hierarchal LSTM & GRU (EDHLG), EDHBG, EDHGL, EDHGB & EDHGG are developed with dropout layers. The raw data learned from LSTM will be given as the input to GRU layer for deepest learning. Thus, the gradient problem is reduced, and accuracy of each emotion was increased. Also, to enhance the accuracy level spatial features were concatenated with MFCC. Thus, in all models, the experimental evaluation with the Tamil emotional dataset yielded the best results. EDHLG has a 93.12% accuracy, EDHGL has a 92.56 percent accuracy, EDHBG has a 95.42 percent accuracy, EDHGB has a 96 percent accuracy, and EDHGG has a 94 percent accuracy. Furthermore, the average accuracy rate of a single individual LSTM layer is 74%, while BILSTM is 77%. EDHGB outperforms almost all other systems, by an optimal system of 94.27 percent and then a maximum overall accuracy of 95.99 percent. For the Tamil emotion data, emotional states such as happy, fearful, angry, sad, and neutral have a 100% prediction accuracy, while disgust has a 94 percent efficiency rate and boredom has an 82 percent accuracy rate. Also, the training time and evaluation time utilized by EDHGB is 4.43 mins and 0.42 mins which is less when compared with other models. Hence by changing the LSTM, BILSTM and GRU layers large analysis of experiment on Tamil dataset is done and EDHGB is superior to other models, and when compared with basic models LSTM and BILSTM around 26% more efficiency is gained.

[...] Read more.
Other Articles