A Novel Multimodal Sarcasm Detection Methodology with Emotion Recognition Using E-RS-GRU and KLKI-FUZZY Techniques

PDF (4511KB), PP.111-126

Views: 0 Downloads: 0

Author(s)

Ravi Teja Gedela 1 J. N. V. R. Swarup Kumar 1 Venkateswararao Kuna 1 Sasibhushana Rao Pappu 1,*

1. Department of Computer Science and Engineering, GITAM School of Technology, GITAM (Deemed to be University), Visakhapatnam, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2025.06.08

Received: 11 Dec. 2024 / Revised: 25 Feb. 2025 / Accepted: 20 Apr. 2025 / Published: 8 Dec. 2025

Index Terms

Sarcasm Detection, Emotion Classification, Frequency Spectral Analysis, Feature Fusion, Feature Optimization

Abstract

Sarcasm, a subtle form of expression, is challenging to detect, especially in modern communication platforms where communication transcends text to encompass videos, images, and audio. Traditional sarcasm detection methods rely solely on textual data and often struggle to capture the nuanced emotional inconsistencies inherent in sarcastic remarks. To overcome these shortcomings, this paper introduces a novel multimodal framework incorporating text, audio, and emoji data for more effective sarcasm detection and emotion classification. A key component of this framework is the Contextualized Semantic Self-Guided BERT (CS-SGBERT) model, which generates efficient word embeddings. Primarily, frequency spectral analysis is performed on the audio data, followed by preprocessing and feature extraction, while text data undergoes preprocessing to extract lexicon and irony features. Meanwhile, emojis are analyzed for polarity scores, which provide a rich set of multimodal features. The fused features are then optimized using the Camberra-based Dingo Optimization Algorithm (C-DOA). The selected features and the embedded words from the preprocessed texts are given to Entropy-based Robust Scaling - Gated Recurrent Units (E-RS-GRU) for detecting sarcasm. Experimental results on the MUStARD dataset show that the proposed E-RS-GRU model achieves an accuracy of 76.65% and F1-score of 76.9%, with a relative improvement of 2.18% over the best-performing baseline and 1.25% over the best-performing state-of-the-art model. Additionally, KLKI-Fuzzy model is proposed for emotion recognition, which dynamically adjusts membership functions through Kullback-Leibler Kriging Interpolation (KLKI), enhancing emotion classification by processing features from all modalities. The KLKI-Fuzzy model exhibits enhanced emotion recognition performance with reduced fuzzification and defuzzification times.

Cite This Paper

Ravi Teja Gedela, J. N. V. R. Swarup Kumar, Venkateswararao Kuna, Sasibhushana Rao Pappu, "A Novel Multimodal Sarcasm Detection Methodology with Emotion Recognition Using E-RS-GRU and KLKI-FUZZY Techniques", International Journal of Modern Education and Computer Science(IJMECS), Vol.17, No.6, pp. 111-126, 2025. DOI:10.5815/ijmecs.2025.06.08

Reference

[1]Bo Pang, Lillian Lee, et al. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2):1–135, 2008.
[2]Walaa Medhat, Ahmed Hassan, and Hoda Korashy. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4):1093–1113, 2014.
[3]Ravi Teja Gedela, Pavani Meesala, Ujwala Baruah, and Badal Soni. Identifying sarcasm using heterogeneous word embeddings: a hybrid and ensemble perspective. Soft Computing, pages 1–14, 2023.
[4]Ronen Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4):82–89, 2013.
[5]Yanying Mao, Qun Liu, and Yu Zhang. Sentiment analysis methods, applications, and challenges: A systematic literature review. Journal of King Saud University-Computer and Information Sciences, 36(4):102048, 2024.
[6]Karnati Vivek, Sai Mani Garrepalli, PVS Ajay Krishna, Inti Charan Kumar, Ravi Teja Gedela, and Sasibhushara Rao Pappu. Telmemood: A multimodal dataset for sentiment analysis of telugu memes. In 2025 3rd International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC), pages 893–899. IEEE, 2025.
[7]Salvatore Attardo. Irony as relevant inappropriateness. Journal of pragmatics, 32(6):793–826, 2000.
[8]Ravi Teja Gedela, Ujwala Baruah, and Badal Soni. Deep contextualised text representation and learning for sarcasm detection. Arabian Journal for Science and Engineering, 49(3):3719–3734, 2024.
[9]Dushyant Singh Chauhan, SR Dhanush, Asif Ekbal, and Pushpak Bhattacharyya. Sentiment and emotion help sarcasm? a multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 4351–4360, 2020.
[10]Yang Wu, Yanyan Zhao, Xin Lu, Bing Qin, Yin Wu, Jian Sheng, and Jinlong Li. Modeling incongruity between modalities for multimodal sarcasm detection. IEEE MultiMedia, 28(2):86–95, 2021.
[11]Dushyant Singh Chauhan, Gopendra Vikram Singh, Aseem Arora, Asif Ekbal, and Pushpak Bhattacharyya. An emoji-aware multitask framework for multimodal sarcasm detection. Knowledge-Based Systems, 257:109924, 2022.
[12]Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524, 2017.
[13]Aditya Joshi, Pushpak Bhattacharyya, and Mark J Carman. Automatic sarcasm detection: A survey. ACM Computing Surveys (CSUR), 50(5):1–22, 2017.
[14]GRS Murthy, Ravi Teja Gedela, and Sasibhushana Rao Pappu. Stacking ensemble-based approach for sarcasm identification with multiple contextual word embeddings. In International Conference on Data Management, Analytics & Innovation, pages 71–81. Springer, 2024.
[15]Wangqun Chen, Fuqiang Lin, Guowei Li, and Bo Liu. A survey of automatic sarcasm detection: Fundamental theories, formulation, datasets, detection methods, and opportunities. Neurocomputing, 578:127428, 2024.
[16]Md Saifullah Razali, Alfian Abdul Halin, Noris Mohd Norowi, and Shyamala C. Doraisamy. The importance of multimodality in sarcasm detection for sentiment analysis. In 2017 IEEE 15th Student Conference on Research and Development (SCOReD), pages 56–60, 2017.
[17]Dipto Das. A multimodal approach to sarcasm detection on social media. 2019.
[18]Rossano Schifanella, Paloma De Juan, Joel Tetreault, and Liangliang Cao. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM international conference on Multimedia, pages 1136–1145, 2016.
[19]Yitao Cai, Huiyu Cai, and Xiaojun Wan. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2506–2515, 2019.
[20]Xinyu Wang, Xiaowen Sun, Tan Yang, and Hongbo Wang. Building a bridge: a method for image-text sarcasm detection without pretraining on image-text data. In Proceedings of the first international workshop on natural language processing beyond text, pages 19–29, 2020.
[21]Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, and Soujanya Poria. Towards multimodal sarcasm detection (an _obviously_ perfect paper). arXiv preprint arXiv:1906.01815, 2019.
[22]Ning Ding, Sheng-wei Tian, and Long Yu. A multimodal fusion method for sarcasm detection based on late fusion. Multimedia Tools and Applications, 81(6):8597–8616, 2022.
[23]Pragya Singh Tomar, Kirti Mathur, and Ugrasen Suman. Fusing facial and speech cues for enhanced multimodal emotion recognition. International Journal of Information Technology, 16(3):1397–1405, 2024.
[24]Shiqing Zhang, Yijiao Yang, Chen Chen, Xingnan Zhang, Qingming Leng, and Xiaoming Zhao. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects. Expert Systems with Applications, page 121692, 2023.
[25]AV Geetha, T Mala, D Priyanka, and E Uma. Multimodal emotion recognition with deep learning: advancements, challenges, and future directions. Information Fusion, 105:102218, 2024.
[26]Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mi- halcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508, 2018.
[27]Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, and Roger Zimmermann. Icon: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 2594–2604, 2018.
[28]Garima Sharma and Abhinav Dhall. A survey on automatic multimodal emotion recognition in the wild. Advances in data science: Methodologies and applications, pages 35–64, 2021.
[29]Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 370–379, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[30]Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. All-in-one: A deep attentive multi-task learning framework for humour, sarcasm, offensive, motivation, and sentiment on memes. In Kam-Fai Wong, Kevin Knight, and Hua Wu, editors, Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 281–290, Suzhou, China, December 2020. Association for Computational Linguistics.
[31]Abdul Aziz, Nihad Karim Chowdhury, Muhammad Ashad Kabir, Abu Nowshed Chy, and Md Jawad Siddique. Mmtf-des: A fusion of multimodal transformer models for desire, emotion, and sentiment analysis of social media data. arXiv preprint arXiv:2310.14143, 2023.
[32]Yazhou Zhang, Jinglin Wang, Yaochen Liu, Lu Rong, Qian Zheng, Dawei Song, Prayag Tiwari, and Jing Qin. A multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations. Information Fusion, 93:282–301, 2023.
[33]Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, and Soujanya Poria. Towards multimodal sarcasm detection (an _Obviously_ perfect paper). In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4619–4629, Florence, Italy, July 2019. Association for Computational Linguistics.
[34]James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David D Cox. Hyperopt: a python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1):014008, 2015.
[35]Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, and Diptesh Kanojia. Sarcasm in sight and sound: Benchmarking and expansion to improve multimodal sarcasm detection. arXiv preprint arXiv:2310.01430, 2023.
[36]Yazhou Zhang, Yaochen Liu, Qiuchi Li, Prayag Tiwari, Benyou Wang, Yuhua Li, Hari Mohan Pandey, Peng Zhang, and Dawei Song. Cfn: a complex-valued fuzzy network for sarcasm detection in conversations. IEEE Transactions on Fuzzy Systems, 29(12):3696–3710, 2021.
[37]E.C. Hedberg and Stephanie Ayers. The power of a paired t-test with a covariate. Social Science Research, 50:277–291, 2015.