Kyrylo Smelyakov

Work place: Department of Software Engineering, Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, 61166, Ukraine

E-mail: kyrylo.smelyakov@nure.ua

Website:

Research Interests: Artificial Intelligence

Biography

Kyrylo Smelyakov is currently the Head of the Department of Software Engineering at Kharkiv National University of Radio Electronics (NURE, https://nure.ua/en/staff/kyrylo-smelyakov). In 2012, he completed his doctoral degree in Technical Sciences, specializing in "Mathematical modelling and numerical methods" with a dissertation titled "Models and methods of irregular objects images segmentation for off-line machine vision systems". In 2014, he received a professor's certificate from the Department of Mathematics and Software of ACS at Kharkiv National University of Air Force. Kyrylo has authored over 150 publications, and his primary research interests focus on Information technology, artificial intelligence, machine learning, computer vision, mathematical modelling and numerical methods. His work also encompasses natural language processing (NLP), text processing/recognition, and data science.

Author Articles
Information Technology for Modelling Social Trends in Telegram Using E5 Vectors and Hybrid Cluster Analysis

By Roman Lynnyk Victoria Vysotska Zhengbing Hu Dmytro Uhryn Liliia Diachenko Kyrylo Smelyakov

DOI: https://doi.org/10.5815/ijitcs.2025.04.07, Pub. Date: 8 Aug. 2025

The article presents a modern approach to analysing public opinion based on Ukrainian-language content from Telegram channels. This study presents a hybrid clustering approach that combines DBSCAN and K-means algorithms to analyse vectorised Ukrainian-language social media posts in order to detect public opinion trends. The methodology relies on a multilingual neural network–based text vectorisation model, which enables effective representation of the semantic content of posts. Experiments conducted on a corpus of 90 Ukrainian-language messages (collected between March and May 2025) allowed for the identification of six principal thematic clusters reflecting key areas of public discourse. Despite the small volume of the corpus (90 messages), the sample is structured and balanced by topic (news, vacancies, gaming), which allows you to test the effectiveness of the proposed methodology in conditions of limited data. This approach is appropriate in the case of the analysis of short texts in low-resource languages, where large-scale corpora are not available. A special advantage of this approach is the use of semantic vector representation and the construction of graphs of lexical co-occurrence networks (term co-occurrence networks), which demonstrate a stable topological structure even with small amounts of data. It allows you to identify latent topic patterns and coherent clusters that have the potential to scale to broader corpora. The authors acknowledge the limitations associated with sample size, but emphasise the role of this study as a pilot stage for the development of a universal, linguistically adaptive method for analysing public discourse. In the future, it is planned to expand the body to increase the representativeness and accuracy of the conclusions. The paper proposes a hybrid method of automatic thematic cluster analysis of short texts in social media, in particular Telegram. Vectorisation of Ukrainian-language messages is implemented using the transformer model multilingual-e5-large-instruct. A combination of HDBSCAN and K-means algorithms was used to detect clusters. More than 36,000 messages from three Telegram channels (news, games, vacancies) were analysed, and six main thematic clusters were identified. To identify thematic trends, a hybrid clustering approach was used, in which the HDBSCAN algorithm was used at the first stage to identify dense clusters and identify "noise" points, after which K-means were used to reclassify residual ("noise") embeddings to the nearest cluster centres.
Such a two-tier strategy allows you to combine the advantages of flexible allocation of free-form clusters from HDBSCAN and stable classification of less pronounced groups through K-means. It is especially effective when working with fragmented short texts of social networks. To validate the quality of clustering, both visualisation tools (PCA, t-SNE, word clouds) and quantitative metrics were used: Silhouette Score (0.41) and Davis-Boldin index (0.78), which indicate moderate coherence and resolution of clusters. Separately, the high level of "noise" in HDBSCAN (34.2%) was analysed, which may be due to the specifics of short texts, model parameters, or stylistic fragmentation of Telegram messages. The results obtained show the effectiveness of combining modern vectorisation models with flexible clustering methods to identify structured topics in fragmented Ukrainian-language content of social networks. The proposed approach has the potential to further expand to other sources, types of discourse, and tasks of digital sociology. As a result of processing 90 messages received from three different channels (news, gaming content, and vacancies), six main thematic clusters were identified. The largest share is occupied by clusters related to employment (28.2%) and security-patriotic topics (24.7%). The average level of "noise" after the initial HDBSCAN clustering was 34.2%. Additional analysis revealed that post lengths varied significantly, ranging from short announcements (average of 10 words) to analytical texts (over 140 words). Visualisations (timelines, PCA, t-SNE, word clouds, term co-occurrence graphs) confirm the thematic coherence of clusters and reveal changes in thematic priorities over time. The proposed system is an effective tool for detecting information trends in the environment of short, fragmented texts and can be used to monitor public sentiment in low-resource languages. 

[...] Read more.
Other Articles