Deep Learning Network and Renyi-entropy Based Fusion Model for Emotion Recognition Using Multimodal Signals

Full Text (PDF, 1609KB), PP.67-84

Views: 0 Downloads: 0


Jaykumar M. Vala 1,* Udesang K. Jaliya 2,3

1. Computer/IT Engineering, Gujarat Technological University, Chandkheda, Gandhinagar, Gujarat 382424, India

2. Gujarat Technological University, Chandkheda, Gandhinagar, Gujarat 382424, India

3. Department of Computer Engineering, BVM Engineering College, V.V. Nagar, Gujarat 388120, India

* Corresponding author.


Received: 29 Dec. 2021 / Revised: 17 Feb. 2022 / Accepted: 10 Apr. 2022 / Published: 8 Aug. 2022

Index Terms

Multimodal emotion recognition, facial expression, EEG signal, physiological signal, deep learning.


Emotion recognition is a significant research topic for interactive intelligence system with the wide range of applications in different tasks, like education, social media analysis, and customer service. It is the process of perceiving user's emotional response automatically to the multimedia information by means of implicit explanation. With initiation of speech recognition and the computer vision, research on emotion recognition with speech and facial expression modality has gained more popularity in recent decades. Due to non-linear polarity of signals, emotion recognition results a challenging task. To achieve facial emotion recognition using multimodal signals, an effective Bat Rider Optimization Algorithm (BROA)-based deep learning method is proposed in this research. However, the proposed optimization algorithm named BROA is derived by integrating Bat Algorithm (BA) with Rider Optimization Algorithm (ROA), respectively. Here, the multimodal signals include face image, EEG signals, and physiological signals such that the features extracted from these modalities are employed for the process of emotion recognition. The proposed method achieves better performance against exiting methods by acquiring maximum accuracy of 0.8794, and minimum FAR and minimum FRR of 0.1757 and 0.1806.

Cite This Paper

Jaykumar M. Vala, Udesang K. Jaliya, "Deep Learning Network and Renyi-entropy Based Fusion Model for Emotion Recognition Using Multimodal Signals", International Journal of Modern Education and Computer Science(IJMECS), Vol.14, No.4, pp. 67-84, 2022. DOI:10.5815/ijmecs.2022.04.06


[1]Wang F, Sahli H, Gao J, Jiang D, Verhelst W, “Relevance units machine based dimensional and continuous speech emotion prediction”, Multimedia Tools and Applications, vol.74, no.22, pp.9983–10000, 2015.
[2]Drummond PD, Quah SH, “The effect of expressing anger on cardiovascular reactivity and facial blood flow in Chinese and Caucasians”, Psychophysiology, vol.38, pp.190–196, 2001.
[3]Richard Jiang, Anthony T.S.Ho, Ismahane Cheheb, NoorAl-Maadeed, SomayaAl-Maadeed, and Ahmed Bouridane, "Emotion recognition from scrambled facial images via many graph embedding", Pattern Recognition, vol.67, pp.245-251, July 2017.
[4]Gilsang Yoo, Sanghyun SeoSungdae Hong Hyeoncheol Kim , "Emotion extraction based on multi bio-signal using back-propagation neural network ", Multimedia Tools and Applications, vol.77, no.4, pp.4925–4937, February 2018.
[5]Pantic M, Caridakis G, Andre E, Kim J, Karpouzis K, Kollias S, “Multimodal emotion recognition from low-level cues”, In Emotion Oriented Systems, 2011.
[6]Dhall A, Goecke R, Lucey S, Gedeon T, “Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark”, In Proceedings of the International Conference on Computer Vision Workshops, pp 2106–2112, 2011.
[7]Eleftheriadis S, Rudovic O, Pantic M, “Discriminative shared Gaussian processes for multiview and view-invariant facial expression recognition”, IEEE Transactions on Image Processing, vol.24, no.1, pp.189–204, 2015.
[8]Baixi Xing , Hui Zhang, Hui Zhang, Lekai Zhang, Xinda Wu, Xiaoying Shi, Shanghai Yu, and Sanyuan Zhang, "Exploiting EEG Signals and Audiovisual Feature Fusion for Video Emotion Recognition", IEEE Access, vol.7, pp.59844 - 59861, 03 May 2019.
[9]R. Picard, Affective Computing. Cambridge, MA, USA: MIT Press, 1997.
[10]Kolodyazhniy, V.; Kreibig, S.D.; Gross, J.J.; Roth, W.T.; Wilhelm, F.H, “An affective computing approach to physiological emotion specificity: Toward subject-independent and stimulus-independent classification of film-induced emotions”, Psychophysiology, vol.48, 908–922, 2011.
[11]Liu, Y.-J.; Yu, M.; Zhao, G.; Song, J.; Ge, Y.; Shi, Y, “Real-Time Movie-Induced Discrete Emotion Recognition from EEG Signals”, IEEE Transactions on Affective Computing, vol.9, no.4, pp.550–562, 2017.
[12]Menezes, M.L.R.; Samara, A.; Galway, L.; Sant’Anna, A.; Verikas, A.; Alonso-Fernandez, F.; Wang, H.; Bond, R, “Towards emotion recognition for virtual environments: an evaluation of EEG features on benchmark dataset”, Personal and Ubiquitous Computing, vol. 21, pp.1003–1013, 2017.
[13]Rania M. Ghoniem , Abeer D. Algarni, and Khaled Shaalan, " Multi-Modal Emotion Aware System Based on Fusion of Speech and Brain Information", Information vol.10, no.7, 2019.
[14]Jianzhu Guo ; Zhen Lei ; Jun Wan ; Egils Avots ; Noushin Hajarolasvadi ; Boris Knyazev ; Artem Kuharenko, Julio C. Silveira Jacques , Xavier Bar, Hasan Demirel, Sergio Escalera, Jüri Allik, and Gholamreza Anbarjafari, "Dominant and Complementary Emotion Recognition From Still Images of Faces," in IEEE Access, vol. 6, pp. 26391-26403, 2018.
[15]B. Draper, K. Baek, M. Bartlett, J. Beveridge, “Recognizing faces with PCA and ICA”, Computer Vision Image Understanding, Vol.91, no.1-2, pp.115, 2003.
[16]M. H. Yang, “Kernel Eigenfaces vs. kernel Fisherface: face recognition using kernel methods”, International Conference on Automatic Face and Gesture Recognition, pp.215, 2002.
[17]B. Tenenbaum, V. Silva, J. Langford, “A global geometric framework for nonlinear dimensionality”, Science, Vol.290, No.5500, pp.2319, 2000.
[18]X. He, S. Yan, Y. Hu, P. Niyogi, H. J. Zhang, “Face Recognition Using Laplacianfaces”, IEEE Trans. Pattern Analysis & Machine Intelligence, Vol. 27, No. 3, pp.1, Mar. 2005.
[19]Xiaofei He, Deng Cai and Partha Niyogi, "Tensor Subspace Analysis", Advances in Neural Information Processing Systems 18 (NIPS), Vancouver, Canada, Dec. 2005.
[20]Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S, ” Multimodal fusion for multimedia analysis: a survey”, Multimedia Systems, vol.16, no.6, pp.345–379, 2010.
[21]Yin, Z., Zhao, M., Wang, Y., Yang, J., Zhang, J., “Recognition of emotions using multimodal physiological signals and an ensemble deep learning model”, Computer Methods and Programs in Biomedicine, vol.140, pp.93–110, 2016.
[22]Tengfei Song ; Wenming Zheng ; Cheng Lu ; Yuan Zong ; Xilei Zhang ; Zhen Cui, "MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition", IEEE Access, vol.7, pp.12177 - 12191, 09 January 2019.
[23]Poh, N.; Bengio, S. How do correlation and variance of base-experts affect fusion in biometric authentication tasks? IEEE Trans. Signal Process. 2005, 53, 4384–4396.
[24]Liu, Y.-J.; Yu, M.; Zhao, G.; Song, J.; Ge, Y.; Shi, Y. Real-Time Movie-Induced Discrete Emotion Recognition from EEG Signals. IEEE Trans. Affect. Comput. 2018, 9, 550–562.
[25]Dae Ha Kim, Min Kyu Lee, Dong Yoon Choi, and Byung Cheol Song, " Multi-modal Emotion Recognition using Semi-supervised Learning and Multiple Neural Networks in the Wild", In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp.529–535, November 2017.
[26]Muhammad Adeel Asghar, Muhammad Jamil Khan, Fawad , Yasar Amin, Muhammad Rizwan, MuhibUr Rahman , Salman Badnava, andSeyed Sajad Mirjavadi, " EEG-Based Multi-Modal Emotion Recognition using Bag of Deep Features: An Optimal Feature Selection Approach", Sensors, vol.19, no.23, 2019.
[27]Zheng, W.L., Zhu, J.Y., Peng, Y., Lu, B.L.: EEG-based emotion classification using deep belief networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014).
[28]Hamester D, Barros P, Wermter S, “Face expression recognition with a 2-channel Convolutional Neural Network”, In 2015 International Joint Conference on Neural Networks (IJCNN), pp 1–8, 2015.
[29]He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: International Workshop on Audio/Visual Emotion Challenge, pp.73–80 (2015)
[30]Jiamin Liu, Yuanqi Su, Yuehu Liu, " Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN", Pacific Rim Conference on Multimedia, vol.10735, pp.194-204, 10 May 2018.
[31]Kim Y, Lee H, Provost EM, “Deep learning for robust feature generation in audiovisual emotion recognition”, In 2013 I.E. International Conference on Acoustics, Speech and Signal Processing, pp 3687–3691, 2013.
[32]A. Sherly Alphonse and Dejey Dharma, " Novel directional patterns and a Generalized Supervised Dimension Reduction System (GSDRS) for facial emotion recognition", Multimedia Tools and Applications, vol.77, no.8, pp.9455–9488, April 2018.
[33]ChaoLia, Zhongtian Bao, Linhao Li, and ZipingZhao, " Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition", Information Processing & Management, vol.57, no. 3, May 2020.
[34]Kuan Tung ; Po-Kang Liu ; Yu-Chuan Chuang ; Sheng-Hui Wang ; An-Yeu Andy Wu, "Entropy-Assisted Multi-Modal Emotion Recognition Framework Based on Physiological Signals", In proceedings of the IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), 3-6 Dec. 2018.
[35]Huang, F.; Zhang, X.; Zhao, Z.; Xu, J.; Li, Z. Image–text sentiment analysis via deep multimodal attentive fusion. Knowl.-Based Syst. 2019, 167, 26–37.
[36]M. Demirkus, D. Precup, J. J. Clark, and T. Arbel, “Multi-layer temporal graphical model for head pose estimation in real-world videos,” in IEEE International Conference on Image Processing, pp. 3392–3396, 2015.
[37]S. Jain, C. Hu, and J. K. Aggarwal, “Facial expression recognition with temporal modeling of shapes,” in IEEE International Conference on Computer Vision Workshops, pp. 1642–1649, 2011.
[38]R. Ali, “Depth camera-based facial expression recognition system using multilayer scheme,” Iete Technical Review, vol. 31, no. 4, pp. 277–286, 2014.
[39]M. Valstar, M. Pantic, and I. Patras, “Motion history for facial action detection in video,” in IEEE International Conference on Systems, Man and Cybernetics, vol.1, pp. 635–640, 2004.
[40]DEAPdataset, "", accessed on February 2020.
[41]Tapabrata Chakraborti, Brendan McCane, Steven Mills, and Umapada Pal, " LOOP Descriptor: Encoding Repeated Local Patterns for Fine-grained Visual Identification of Lepidoptera", October 2017.
[42]Chahi, A., Ruichek, Y. and Touahni, R., “Local directional ternary pattern: A new texture descriptor for texture classification”, Computer vision and image understanding, vol.169, pp.14-27, 2018.
[43]Lv, Z. and Qiao, L., “Deep belief network and linear perceptron based cognitive computing for collaborative robots”, Applied Soft Computing, vol.92, pp.106300, 2020.
[44]Inoue, M., Inoue, S. and Nishida, T., “Deep recurrent neural network for mobile human activity recognition with high throughput”, Artificial Life and Robotics, vol.23, no.2, pp.173-185, 2018.
[45]D. Binu and B. S Kariyappa, " RideNN: A New Rider Optimization Algorithm-Based Neural Network for Fault Diagnosis in Analog Circuits", IEEE Transactions on Instrumentation and Measurement, vol.68, no.1, pp. 2 – 26, January 2019.
[46]Yang, X.S., “A new metaheuristic bat-inspired algorithm”, In Nature inspired cooperative strategies for optimization (NICSO 2010), Springer, Berlin, Heidelberg, pp. 65-74, 2010.