Victoria Vysotska; Sofiia Popp; Viktoriia Bulatova; Zhengbing Hu; Yuriy Ushenko; Dmytro Uhryn

Smart Tool for Identifying Misinformation Spread Sources and Routes in Social Networks Based on NLP and Machine Learning

PDF (4501KB), PP.114-165

Views: 0 Downloads: 0

Author(s)

Victoria Vysotska ¹ Sofiia Popp ² Viktoriia Bulatova ² Zhengbing Hu ³ Yuriy Ushenko ⁴ Dmytro Uhryn ⁴

1. Information Systems and Networks Department, Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Department of Information Systems and Networks, Institute of Computer Sciences and Information Technologies, Lviv Polytechnic National University, Lviv, 79013, Ukraine

3. School of Computer Science, Hubei University of Technology, Wuhan, China

4. Department of Computer Science of the Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2025.05.08

Received: 18 May 2025 / Revised: 9 Jul. 2025 / Accepted: 27 Aug. 2025 / Published: 8 Oct. 2025

Index Terms

Fake News, Machine Learning, Ukrainian-language Texts, Telegram, TF-IDF, Contextual Embeddings, IBM Granite, Logistic Regression, NLP, Disinformation, Text Classification, Information Security

Abstract

This article presents a method for detecting disinformation in news texts based on a combination of classic machine learning algorithms and deep learning models. The proposed approach was tested on the corpus of Ukrainian- and English-language news with the "fake/truth" classes marked. Before modelling, detailed data pre-processing was performed: deletion of duplicates, cleaning of HTML tags, links and special characters, normalisation of texts, unification of labels, class balancing, and tokenisation. A hybrid approach was used for vectorisation: frequency features (TF-IDF) were combined with contextual vector representations based on the IBM Granite multilingual model. Logistic regression is chosen as a classifier, which allows a balance to be achieved between quality and interpretation of results. Standard metrics are used to assess performance, such as Accuracy, Precision, Recall, F1-score, and ROC-AUC. According to the results of experiments, the model showed an Accuracy in the range of 0.91–0.93, a Precision of 0.89, a Recall of 0.92, an F1-score of 0.90, as well as an ROC-AUC over 0.94. The obtained values demonstrate the balanced ability of the system not only to accurately classify news, but also to minimise false positives, which is especially important in the conditions of information warfare. Priority is given to Recall's high scores, as the omission of fake messages can have critical consequences for information security. Thus, the proposed approach makes a scientific contribution to the field of automated disinformation detection by combining transparent and reproducible data processing with a hybrid text representation. The uniqueness of the study lies in the adaptation of NLP and machine learning methods to the Ukrainian-language information space and the context of modern hybrid warfare, which allows you to effectively identify the sources and routes of spreading fake news.

Cite This Paper

Victoria Vysotska, Sofiia Popp, Viktoriia Bulatova, Zhengbing Hu, Yuriy Ushenko, Dmytro Uhryn, "Smart Tool for Identifying Misinformation Spread Sources and Routes in Social Networks Based on NLP and Machine Learning", International Journal of Computer Network and Information Security(IJCNIS), Vol.17, No.5, pp.114-165, 2025. DOI:10.5815/ijcnis.2025.05.08

Reference

[1]J. Alghamdi, S. Luo, and Y. Lin, A comprehensive survey on machine learning approaches for fake news detection. Multimedia Tools and Applications, vol. 83, pp. 51009–51067, 2024. doi: 10.1007/s11042-023-17470-8.
[2]A. P. S. Bali et al., “Comparative performance of machine learning algorithms for fake news detection,” in Advances in Computing and Data Sciences: Proceedings of the 3rd International Conference (ICACDS 2019), Ghaziabad, India, 2019, pp. 420–430. doi: 10.1007/978-981-13-9942-8_40.
[3]M. Potthast et al., A stylometric inquiry into hyperpartisan and fake news, 2017. [Online]. Available: https://arxiv.org/abs/1702.05638
[4]V. Pérez-Rosas et al., Automatic detection of fake news, 2017. [Online]. Available: https://arxiv.org/abs/1708.07104
[5]E. Tacchini et al., Some like it hoax: automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017. [Online]. Available: https://arxiv.org/abs/1704.07506
[6]Y. Liu and Y. F. Wu, “Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018, pp. 354–361. doi: 10.1609/aaai.v32i1.11268.
[7]S. Singhal et al., “SpotFake: a multi-modal framework for fake news detection,” in Proceedings of the 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), 2019, pp. 39–47.
[8]R. K. Kaliyar, A. Goswami, and P. Narang, “FakeBERT: Fake news detection in social media with a BERT-based deep learning approach,” Multimedia Tools and Applications, vol. 80, no. 8, pp. 11765–11788, 2021.
[9]P. Gupta, A Breadth-First Catalog of Text Processing, Speech Processing and Multimodal Research in South Asian Languages. arXiv preprint arXiv:2501.00029, 2024.
[10]A. De, D. Bandyopadhyay, B. Gain, and A. Ekbal, “A transformer-based approach to multilingual fake news detection in low-resource languages,” Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 1, pp. 1–20, 2021.
[11]P. Patwa et al., “Fighting an infodemic: Covid-19 fake news dataset,” in International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Cham: Springer, 2021, pp. 21–29.
[12]S. Kumar et al., “Fake news article detection datasets for Hindi language,” Language Resources and Evaluation, pp. 1–36, 2024.
[13]G. Soliman, “Disinformation and the battle for influence and power in the emerging post-Assad Syria,” Counter Terrorist Trends and Analyses, vol. 17, no. 2, pp. 1–7, 2025.
[14]J. Mandić and D. Klarić, “Case study of the Russian disinformation campaign during the war in Ukraine–propaganda narratives, goals, and impacts,” National Security and the Future, vol. 24, no. 2, pp. 97–140, 2023.
[15]A. Barrón-Cedeño et al., “Overview of the CLEF–2023 CheckThat! Lab on checkworthiness, subjectivity, political bias, factuality, and authority of news articles and their source,” in International Conference of the Cross-Language Evaluation Forum for European Languages, Cham: Springer, 2023, pp. 251–275.
[16]P. Przybyła et al., “Overview of the CLEF-2024 CheckThat! Lab Task 6 on robustness of credibility assessment with adversarial examples (incrediblae),” Working Notes of CLEF, 2024.
[17]H. R. LekshmiAmmal and A. K. Madasamy, “A reasoning based explainable multimodal fake news detection for low resource language using large language models and transformers,” Journal of Big Data, vol. 12, no. 1, p. 46, 2025.
[18]F. S. Al-Anzi and S. B. Shalini, “Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT,” Applied Sciences, vol. 14, no. 22, p. 10498, 2024.
[19]X. Wang, W. Zhang, and S. Rajtmajer, Monolingual and multilingual misinformation detection for low-resource languages: A comprehensive survey. arXiv preprint arXiv:2410.18390, 2024.
[20]J. Alghamdi, Y. Lin, and S. Luo, “Fake news detection in low-resource languages: A novel hybrid summarisation approach,” Knowledge-Based Systems, vol. 296, p. 111884, 2024.
[21]M. Abbas Yousef, A. ElKorany, and H. Bayomi, “Fake-news detection: a survey of evaluation Arabic datasets,” Social Network Analysis and Mining, vol. 14, no. 1, p. 225, 2024.
[22]A. B. Nassif, A. Elnagar, O. Elgendy, and Y. Afadar, “Arabic fake news detection based on deep contextualised embedding models,” Neural Computing and Applications, vol. 34, no. 18, pp. 16019–16032, 2022.
[23]F. K. A. Salem et al., “Meta-learning for fake news detection surrounding the Syrian war,” Patterns, vol. 2, no. 11, 2021.
[24]K. Patil et al., “Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and Telugu,” Zenodo, 2024. doi: 10.5281/zenodo.11408512.
[25]Kaggle, Hindi Fake News Detection Challenge. [Online]. Available: https://www.kaggle.com/competitions/hindi-fake-news-detection-challenge
[26]H. R. LekshmiAmmal and A. K. Madasamy, “A reasoning based explainable multimodal fake news detection for low resource language using large language models and transformers,” Journal of Big Data, vol. 12, no. 1, p. 46, 2025.
[27]P. Nakov et al., “The CLEF-2021 CheckThat! Lab on detecting check-worthy claims, previously fact-checked claims, and fake news,” in European Conference on Information Retrieval, Cham: Springer, 2021, pp. 639–649.
[28]P. Nakov et al., “Overview of the CLEF-2022 CheckThat! Lab Task 2 on detecting previously fact-checked claims,” in CLEF Working Notes, 2022, pp. 393–403.
[29]M. Hunder, “Russia vs Ukraine: the biggest war of the fake news era,” Reuters, Jul. 31, 2024. [Online]. Available: https://www.reuters.com/world/europe/russia-vs-ukraine-biggest-war-fake-news-era-2024-07-31.
[30]V. Vysotska, K. Przystupa, Y. Kulikov, S. Chyrun, Y. Ushenko, Z. Hu, and D. Uhryn, “Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content based on NLP and Machine-learning Technology,” International Journal of Computer Network and Information Security (IJCNIS), vol. 17, no. 1, pp. 92–127, 2025. doi: 10.5815/ijcnis.2025.01.08.
[31]M. Nyzova, V. Vysotska, L. Chyrun, Z. Hu, Y. Ushenko, and D. Uhryn, “Smart Tool for Text Content Analysis to Identify Key Propaganda Narratives and Disinformation in News Based on NLP and Machine Learning,” International Journal of Computer Network and Information Security (IJCNIS), vol. 17, no. 4, pp. 113–175, 2025. doi: 10.5815/ijcnis.2025.04.08.
[32]R. Lynnyk, V. Vysotska, Z. Hu, D. Uhryn, L. Diachenko, and K. Smelyakov, “Information Technology for Modelling Social Trends in Telegram Using E5 Vectors and Hybrid Cluster Analysis,” International Journal of Information Technology and Computer Science (IJITCS), vol. 17, no. 4, pp. 80–119, 2025. doi: 10.5815/ijitcs.2025.04.07.
[33]V. Vysotska, Z. Hu, N. Mykytyn, O. Nagachevska, K. Hazdiuk, and D. Uhryn, “Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems,” International Journal of Computer Network and Information Security (IJCNIS), vol. 17, no. 3, pp. 109–143, 2025. doi: 10.5815/ijcnis.2025.03.07.
[34]V. Vysotska, K. Przystupa, L. Chyrun, S. Vladov, Y. Ushenko, D. Uhryn, and Z. Hu, “Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods,” International Journal of Computer Network and Information Security (IJCNIS), vol. 16, no. 5, pp. 57–85, 2024. doi: 10.5815/ijcnis.2024.05.06.
[35]IBM, Granite embedding models – model card and usage guide. [Online]. Available: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models-embed.html.
[36]IBM, granite-embedding-107m-multilingual. [Online]. Available: https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual.
[37]A. Vaswani et al., Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.
[38]Pyrogram, MTProto vs. bot API – Pyrogram documentation. [Online]. Available: https://docs.pyrogram.org/topics/mtproto-vs-botapi.
[39]Telegram, Creating your Telegram application. [Online]. Available: https://core.telegram.org/api/obtaining_api_id.

International Journal of Computer Network and Information Security (IJCNIS)