Smart Tool for Text Content Analysis to Identify Key Propaganda Narratives and Disinformation in News Based on NLP and Machine Learning

PDF (8629KB), PP.113-175

Views: 0 Downloads: 0

Author(s)

Maryna Nyzova 1 Victoria Vysotska 1 Lyubomyr Chyrun 2 Zhengbing Hu 3 Yuriy Ushenko 4,5,* Dmytro Uhryn 5

1. Information Systems and Networks Department, Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Applied Mathematics Department, Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

3. School of Computer Science, Hubei University of Technology, Wuhan, China

4. Department of Physics, Shaoxing University, Shaoxing, Zhejiang Province 312000, China

5. Department of Computer Science of the Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2025.04.08

Received: 22 Jan. 2025 / Revised: 16 Mar. 2025 / Accepted: 14 May 2025 / Published: 8 Aug. 2025

Index Terms

Propaganda, Disinformation, BERT, Narrative, Browser Plugin, Ukrainian Text, NLP, Media Literacy, News Content, Browser Plugins, Text Classification, Information Security and Artificial Intelligence

Abstract

The paper presents the development of a smart tool for automated analysis of news text content in order to identify propaganda narratives and disinformation. The relevance of the project is due to the growth of the information threat in the context of a hybrid war, in particular in the Ukrainian information space. The proposed solution is implemented in the form of a browser plugin that provides instant analysis of content without the need to switch to third-party services. The methodology is based on the use of modern natural language processing (NLP) and deep learning methods (in particular, BERT models) to classify content according to the level of propaganda impact and identify key narratives. As part of the study, modern models of transformers for text analysis, in particular BERT, were used. For the task of classifying propaganda, pre-trained GloVe vectors optimised for news articles were used, which provided the best results among the options considered. Instead, the BERT model was used to classify narratives, which showed higher accuracy in the processing of texts reflecting subjective thoughts. The adaptation included the use of a multilingual version of BERT (multilingual BERT), as it allows you to effectively work with Ukrainian-language data, which is a key advantage for localised analysis in the context of information warfare. Before using BERT, pre-processing of texts was carried out with the addition of syntactic, punctuation, emotional and stylistic features, which increased the accuracy of classification. For a more complete and reliable assessment of the effectiveness of propaganda classification models and narratives, a set of key metrics was used for propaganda/ narratives analyses Accuracy (0.94/0.86), Precision (0.95/0.69), Recall (0.96/0.71) and F1-score (0.96/0.70).The developed model showed high accuracy results: the F1-score for the propaganda classification problem was 0.96 and for the narrative classification problem – 0.70, which significantly exceeds the results of similar approaches, in particular XGBoost (0.92 and 0.50, respectively). In addition, the system supports full-fledged work with Ukrainian-language content, which is its key competitive advantage. The practical application of the tool covers journalism, fact-checking, analytics, and improving media literacy among citizens, contributing to the improvement of the state's information security.

Cite This Paper

Maryna Nyzova, Victoria Vysotska, Lyubomyr Chyrun, Zhengbing Hu, Yuriy Ushenko, Dmytro Uhryn, "Smart Tool for Text Content Analysis to Identify Key Propaganda Narratives and Disinformation in News Based on NLP and Machine Learning", International Journal of Computer Network and Information Security(IJCNIS), Vol.17, No.4, pp.113-175, 2025. DOI:10.5815/ijcnis.2025.04.08

Reference

[1]Lühring, J., Metzler, H., Lazzaroni, R., Shetty, A., & Lasser, J. (2025). Best practices for source-based research on misinformation and news trustworthiness using NewsGuard. Journal of Quantitative Description: Digital Media, 5. https://doi.org/10.51685/jqd.2025.003
[2]Hassan, N., et al. (2017). ClaimBuster: The first-ever end-to-end fact-checking system. Proc. VLDB Endow.10(12) Pp. 1945–1948. https://doi.org/10.14778/3137765.3137815
[3]Shao, C., et al. (2016). Hoaxy: A platform for tracking online misinformation. Proc. WWW '16 Companion. https://doi.org/10.1145/2872518.2890098
[4]Shu, K., et al. (2019). Fakey: A game intervention to improve news literacy on social media. Computers in Human Behavior. https://doi.org/10.1145/3449080
[5]Da San Martino, G., et al. (2019). Fine-grained analysis of propaganda in news articles. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5636–5646, Hong Kong, China. Association for Computational Linguistics. https://dx.doi.org/10.18653/v1/D19-1565 
[6]Barrón-Cedeño, A., Da San Martino, G., Jaradat, I., & Nakov, P. (2019). Proppy: A System to Unmask Propaganda in Online News. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9847-9848. https://doi.org/10.1609/aaai.v33i01.33019847
[7]Devlin, J., et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423
[8]Rogers, A., et al. (2020). A Primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics 2020; 8 842–866. https://doi.org/10.1162/tacl_a_00349
[9]Fernandes, S., et al. (2020). Detecting deepfake videos using attribution-based confidence metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 308-309). https://openaccess.thecvf.com/content_CVPRW_2020/papers/w19/Fernandes_Detecting_Deepfake_Videos_Using_Attribution-Based_Confidence_Metric_CVPRW_2020_paper.pdf 
[10]Victoria Vysotska, Krzysztof Przystupa, Yurii Kulikov, Sofiia Chyrun, Yuriy Ushenko, Zhengbing Hu, Dmytro Uhryn, "Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content based on NLP and Machine-learning Technology", International Journal of Computer Network and Information Security, Vol.17, No.1, pp.92-127, 2025. 
[11]Danylo Levkivskyi, Victoria Vysotska, Lyubomyr Chyrun, Yuriy Ushenko, Dmytro Uhryn, Cennuo Hu, "Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods", International Journal of Information Engineering and Electronic Business, Vol.17, No.2, pp. 1-50, 2025.
[12]Oleh Prokipchuk, Victoria Vysotska, Petro Pukach, Vasyl Lytvyn, Dmytro Uhryn, Yuriy Ushenko, Zhengbing Hu, "Intelligent Analysis of Ukrainian-language Tweets for Public Opinion Research based on NLP Methods and Machine Learning Technology", International Journal of Modern Education and Computer Science, Vol.15, No.3, pp. 70-93, 2023.
[13]Plikynas, D., Rizgelienė, I., & Korvel, G. (2025). Systematic Review of Fake News, Propaganda, and Disinformation: Examining Authors, Content, and Social Impact through Machine Learning. IEEE Access. https://ieeexplore.ieee.org/abstract/document/10843666/
[14]Horák, A., Sabol, R., Herman, O., & Baisa, V. (2024). Recognition of propaganda techniques in newspaper texts: Fusion of content and style analysis. Expert Systems with Applications, 251, 124085.
[15]Lilleker, D., & Surowiec, P. (2020). Content analysis and the examination of digital propaganda on social media. In The SAGE Handbook of Propaganda (pp. 171-188). SAGE Publications Ltd.
[16]Santos, F. C. C. (2023). Artificial intelligence in automated detection of disinformation: A thematic analysis. Journalism and Media, 4(2), 679-687.
[17]Iosifidis, P., & Nicoli, N. (2020). The battle to end fake news: A qualitative content analysis of Facebook announcements on how it combats disinformation. International Communication Gazette, 82(1), 60-81.
[18]Horák, A., Baisa, V., & Herman, O. (2021). Technological approaches to detecting online disinformation and manipulation. Challenging online propaganda and disinformation in the 21st century, 139-166.
[19]Kanozia, R. (2019). Analysis of digital tools and technologies for debunking fake news. Journal of Content, Community & Communication, 9(5), 114-122.
[20]Nasiri, S., & Hashemzadeh, A. (2025). The Evolution of Disinformation from Fake News Propaganda to AI-driven Narratives as Deepfake. Journal of Cyberspace Studies, 9(1), 203-222.
[21]Oates, S., Lee, D., & Knickerbocker, D. (2022). Data Analysis of Russian Disinformation Supply Chains: Finding Propaganda in the US Media Ecosystem in Real Time. Oates, Sarah, Doowan Lee, and David Knickerbocker.
[22]Giordano, L. On the Automatic Multilingual Detection of Persuasion Techniques in the News: A Natural Language Processing Approach. https://amslaurea.unibo.it/id/eprint/34127/
[23]Danylo Holubinka, Victoria Vysotska, Serhii Vladov, Yuriy Ushenko, Mariia Talakh, Yurii Tomka, "Intelligent System for Recognizing Tone and Categorizing Text in Media News at an Electronic Business Based on Sentiment and Sarcasm Analysis", International Journal of Information Engineering and Electronic Business, Vol.17, No.1, pp. 90-139, 2025.
[24]Dmytro Uhryn, Victoria Vysotska, Lyubomyr Chyrun, Sofia Chyrun, Cennuo Hu, Yuriy Ushenko, "Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis", International Journal of Intelligent Systems and Applications, Vol.17, No.2, pp.56-100, 2025.
[25]Afeez Ayomide Olagunju, Iyabo Olukemi Awoyelu, "Performance Evaluation of Fake News Detection Models", International Journal of Information Technology and Computer Science, Vol.16, No.6, pp.89-100, 2024.
[26]Solopova, V., Popescu, O. I., Benzmüller, C., & Landgraf, T. (2023). Automated multilingual detection of pro-kremlin propaganda in newspapers and telegram posts. Datenbank-Spektrum, 23(1), 5-14.
[27]West, R., & Pfeffer, J. (2017, May). Armed conflicts in online news: a multilingual study. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11, No. 1, pp. 309-318.
[28]Khairova, N., Ivasiuk, B., LO SCUDO, F., Comito, C., & Galassi, A. (2023). A First Attempt to Detect Misinformation in Russia-Ukraine War News through Text Similarity. In Proceedings of the 4th Conference on Language, Data and Knowledge (LDK) (pp. 559-564). NOVA CLUNL.
[29]Ozcelik, O., Yenicesu, A. S., Yildirim, O., Haliloglu, D. S., Eroglu, E. E., & Can, F. (2023, September). Cross-lingual transfer learning for misinformation detection: Investigating performance across multiple languages. In Proceedings of the 4th Conference on Language, Data and Knowledge (pp. 549-558).
[30]GitHub - hybrinfox/ppn: Repository for the Propagandist Pseudo-News dataset. GitHub. URL: https://github.com/hybrinfox/ppn  
[31]zeusfsx/ukrainian-news · Datasets at Hugging Face. Hugging Face – The AI community building the future. URL: https://huggingface.co/datasets/zeusfsx/ukrainian-news 
[32]Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[33]Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
[34]Wolf, Thomas, et al. "Huggingface's transformers: State-of-the-art natural language processing." arXiv preprint arXiv:1910.03771 (2019).
[35]Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[36]TextBlob/textblob/en/en-sentiment.xml at eb08c120d364e908646731d60b4e4c6c1712ff63 · sloria/TextBlob. GitHub. URL: https://github.com/sloria/TextBlob/blob/eb08c120d364e908646731d60b4e4c6c1712f f63/textblob/en/en-sentiment.xml 
[37]GitHub - skupriienko/Ukrainian-Sentiment-Analysis: The list of Ukrainian words for sentiment analysis and NLP. GitHub. URL: https://github.com/skupriienko/Ukrainian-Sentiment-Analysis  
[38]Icard B. Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification. Academia.edu - Find Research Papers, 
[39]Topics, Researchers. URL: https://www.academia.edu/122724298/Exposing_propaganda_an_analysis_of_stylisti c_cues_comparing_human_annotations_and_machine_classification  
[40]Propagandist Solovyov called for "Berlin" and threatened the United States. TSN.ua. URL: https://tsn.ua/svit/propahandyst-solovyov-zaklykav-dovbanuty-po-berlinu-ta-pryhroz