Work place: Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (KLEF), Vaddeswaram, Guntur District 522302, India
E-mail: cnu.pvvs@gmail.com
Website: https://orcid.org/0000-0002-4479-2594
Research Interests:
Biography
P. V. V. S. Srinivas is an Associate Professor at K L University with significant expertise in computer science and engineering, particularly in deep learning and data science. His work focuses on applying advanced computational approaches to medical diagnostics, renewable energy, and machine learning optimization problems. Dr. Srinivas has published in several prominent journals, contributing to research on topics like facial emotion recognition, photovoltaic systems, and distributed computing. His research has garnered over 400 citations, reflecting the impact of his collaborative studies. Beyond publishing, he actively mentors students, participates in peer review, and fosters interdisciplinary academic networking, positioning himself as a prominent figure in advancing computational research in academia.
By Srinivas P. V. V. S. Shaik Nazeera Khamar Nohith Borusu Mohan Guru Raghavendra Kota Harika Vuyyuru Sampath Patchigolla
DOI: https://doi.org/10.5815/ijigsp.2026.01.07, Pub. Date: 8 Feb. 2026
In the field of affective computing research, multi-modal emotion detection has gained popularity as a way to boost recognition robustness and get around the constraints of processing a multiple type of data. Human emotions are utilized for defining a variety of methodologies, including physiological indicators, facial expressions, as well as neuroimaging tactics. Here, a novel deep attention mechanism is used for detecting multi-modal emotions. Initially, the data are collected from audio and video features. For dimensionality reduction, the audio features are extracted using Constant-Q chromagram and Mel-Frequency Cepstral Coefficients (MM-FC2). After extraction, the audio generation is carried out by a Convolutional Dense Capsule Network (Conv_DCN) is used. Next is video data; the key frame extraction is carried out using Enhanced spatial-temporal and Second-Order Gaussian kernels. Here, Second-Order Gaussian kernels are a powerful tool for extracting features from video data and converting it into a format suitable for image-based analysis. Next, for video generation, DenseNet-169 is used. At last, all the extracted features are fused, and emotions are detected using a Weighted Late Fusion Deep Attention Neural Network (WLF_DAttNN). Python tool is used for implementation, and the performance measure achieved an accuracy of 97% for RAVDESS and 96% for CREMA-D dataset.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals