Nohith Borusu

Work place: Department of Computer Science and Information Technology, K L University, Vaddeswaram, Guntur District 522302, India

E-mail: nohithborusu56@gmail.com

Website: https://orcid.org/0009-0007-1387-6692

Research Interests:

Biography

Nohith Borusu completed his B.Tech in Computer Science and Information Technology (CSIT) from KL University, specializing in Machine Learning, Data Science, and Artificial Intelligence. During my studies, he worked on a research paper on multimodal emotion recognition, focusing on how multiple data sources like text, speech, and visuals can improve emotion detection. Currently, he is working at Microland as a ServiceNow Developer in the ServiceNow Center of Excellence, where he designs catalog items, build workflows, and develop automation solutions. His goal is to bridge AI research and enterprise IT innovation.

Author Articles
Weighted Late Fusion based Deep Attention Neural Network for Detecting Multi-Modal Emotion

By Srinivas P. V. V. S. Shaik Nazeera Khamar Nohith Borusu Mohan Guru Raghavendra Kota Harika Vuyyuru Sampath Patchigolla

DOI: https://doi.org/10.5815/ijigsp.2026.01.07, Pub. Date: 8 Feb. 2026

In the field of affective computing research, multi-modal emotion detection has gained popularity as a way to boost recognition robustness and get around the constraints of processing a multiple type of data. Human emotions are utilized for defining a variety of methodologies, including physiological indicators, facial expressions, as well as neuroimaging tactics. Here, a novel deep attention mechanism is used for detecting multi-modal emotions. Initially, the data are collected from audio and video features. For dimensionality reduction, the audio features are extracted using Constant-Q chromagram and Mel-Frequency Cepstral Coefficients (MM-FC2). After extraction, the audio generation is carried out by a Convolutional Dense Capsule Network (Conv_DCN) is used. Next is video data; the key frame extraction is carried out using Enhanced spatial-temporal and Second-Order Gaussian kernels. Here, Second-Order Gaussian kernels are a powerful tool for extracting features from video data and converting it into a format suitable for image-based analysis. Next, for video generation, DenseNet-169 is used. At last, all the extracted features are fused, and emotions are detected using a Weighted Late Fusion Deep Attention Neural Network (WLF_DAttNN). Python tool is used for implementation, and the performance measure achieved an accuracy of 97% for RAVDESS and 96% for CREMA-D dataset.

[...] Read more.
Other Articles