IJIEEB Vol. 18, No. 3, 8 Jun. 2026
Cover page and Table of Contents: PDF (size: 1119KB)
PDF (1119KB), PP.21-36
Views: 0 Downloads: 0
Sarcasm Detection, Multi-Modal Learning, Cross-Attention, Sentiment Analysis, Emoji Integration and Transformer Models
In the era of social media-driven communication, sarcasm poses a big challenge for the automated sentiment analysis systems, much more on platforms like Twitter, due to the brevity and often contextually ambiguous nature of the text. Misinterpretation of sarcastic content may degrade the reliability of downstream analytics, encompassing opinion mining and content moderation. To address this challenge, we propose, in this paper, a multi-modal transformer-based approach to sarcasm detection, which integrates textual and emoji information through the use of a cross-attention mechanism. The proposed model utilizes RoBERTa for the contextual processing of textual content to generate contextualized text embeddings, whereas emojis are encoded using Emoji-BERT to capture emoji-specific semantic and emotional cuing. A Gated-LSTM layer has been employed to model sequential dependencies among emojis, and a cross-attention mechanism dynamically aligns emoji representations with textual features for enhancing the sarcasm recognition capability. Later, these fused representations are passed to a fully connected classification layer for predicting sarcasm. For the evaluation of the performance of our proposed model against state-of-the-art results, standard metrics of evaluation have been considered. Experimental results demonstrate that the proposed approach outperforms several baseline and state-of-the-art models, with an accuracy of 92.5%, precision of 91.8%, recall of 93.2%, and an F1-score of 92.5%. From these results, we learn that jointly modeling textual and emoji modalities improves the performance of sarcasm detection in social media content. Also, these findings illustrate the potential of the suggested approach in improving sarcasm-aware sentiment analysis in the realm of social media analytics and automated content moderation systems.
Shaikh Ambreen Mohd Ibrahim, Manoj M. Deshpande, Vijaykumar N. Pawar, "A Multi-Modal Transformer Model with Gated-LSTM for Sarcasm Detection in Tweets Using Cross-Attention and Emoji Integration", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.18, No.3, pp. 21-36, 2026. DOI:10.5815/ijieeb.2026.03.02
[1]Wu, Y., Zhao, Y., Lu, X., Qin, B., Wu, Y., Sheng, J., & Li, J., “Modeling incongruity between modalities for multimodal sarcasm detection”, IEEE MultiMedia, 28(2), 86-95, 2021.
[2]Yue, T., Mao, R., Wang, H., Hu, Z., & Cambria, E., “KnowleNet: Knowledge fusion network for multimodal sarcasm detection”, Information Fusion, 100, 101921, 2023.
[3]Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., & Wright, B., “Sarcasm detection using machine learning algorithms in Twitter: A systematic review”, International Journal of Market Research, 62(5), 578-598, 2020.
[4]Jain, D., Kumar, A., & Garg, G., “Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN” Applied Soft Computing, 91, 106198, 2020.
[5]Devlin, J., Chang, M. W., Lee, K., & Toutanova, K., “Bert: Pre-training of deep bidirectional transformers for language understanding” In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186), 2019.
[6]Chauhan, D. S., Singh, G. V., Arora, A., Ekbal, A., & Bhattacharyya, P., “An emoji-aware multitask framework for multimodal sarcasm detection”, Knowledge-Based Systems, 257, 109924, 2022.
[7]Wijeratne, S., Saggion, H., Kiciman, E., & Sheth, A. P., “Emoji understanding and applications in social media: lay of the land and special issue introduction”, ACM Transactions on Social Computing, 3(2), 1-5, 2020.
[8]Ranasinghe, T., Saadany, H., Plum, A., Mandhari, S., Mohamed, E., Orasan, C., & Mitkov, R., “Deep learning models for irony detection in Arabic language”, RGCL at IDAT , 2019.
[9]Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., & Huang, R., “Sarcasm as contrast between a positive sentiment and negative situation”, In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 704-714), 2013.
[10]Joshi, A., Bhattacharyya, P., & Carman, M. J., “Automatic sarcasm detection: A survey”, ACM Computing Surveys (CSUR), 50(5), 1-22, 2017.
[11]Ghosh, A., & Veale, T., “Fracking sarcasm using neural network”, In Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 161-169), 2016.
[12]Amir, S., Wallace, B. C., Lyu, H., Carvalho, P., & Silva, M. J., “Modelling context with user embeddings for sarcasm detection in social media”, In Proceedings of the 20th SIGNLL conference on computational natural language learning (pp. 167-177), 2016.
[13]Tay, Y., Tuan, L. A., Hui, S. C., & Su, J., “Reasoning with sarcasm by reading in-between”, arXiv preprint arXiv:1805.02856, 2018.
[14]Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S., “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm”, arXiv preprint arXiv:1708.00524, 2017.
[15]Barbieri, F., Camacho-Collados, J., Anke, L. E., & Neves, L., “TweetEval: Unified benchmark and comparative evaluation for tweet classification”, In Findings of the association for computational linguistics: EMNLP 2020 (pp. 1644-1650), 2020.
[16]Keivanlou-Shahrestanaki, Z., Kahani, M., & Zarrinkalam, F., “Interpreting sarcasm on social media using attention-based neural networks”, Knowledge-Based Systems, 258, 109977, 2022.
[17]Hazarika, D., Poria, S., Gorantla, S., Cambria, E., Zimmermann, R., & Mihalcea, R., “Cascade: Contextual sarcasm detection in online discussion forums”, arXiv preprint arXiv:1805.06413, 2018.
[18]Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V., “Roberta: A robustly optimized bert pretraining approach”, arXiv preprint arXiv:1907.11692, 2019.
[19]Kiela, D., Bhooshan, S., Firooz, H., Perez, E., & Testuggine, D., “Supervised multimodal bitransformers for classifying images and text”, arXiv preprint arXiv:1909.02950, 2019.
[20]Ansari, S. A., & Zafar, A., “A review on video analytics its challenges and applications”, Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals: Proceedings of GUCON 2019, 169-182, 2020.
[21]Ansari, S. A., & Zafar, A., “A fusion of dolphin swarm optimization and improved sine cosine algorithm for automatic detection and classification of objects from surveillance videos”, Measurement, 192, 110921,2022.