Work place: Department of Software Engineering, Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, 61166, Ukraine
E-mail: kyrylo.smelyakov@nure.ua
Website:
Research Interests: Artificial Intelligence
Biography
Kyrylo Smelyakov is currently the Head of the Department of Software Engineering at Kharkiv National University of Radio Electronics (NURE, https://nure.ua/en/staff/kyrylo-smelyakov). In 2012, he completed his doctoral degree in Technical Sciences, specializing in "Mathematical modelling and numerical methods" with a dissertation titled "Models and methods of irregular objects images segmentation for off-line machine vision systems". In 2014, he received a professor's certificate from the Department of Mathematics and Software of ACS at Kharkiv National University of Air Force. Kyrylo has authored over 150 publications, and his primary research interests focus on Information technology, artificial intelligence, machine learning, computer vision, mathematical modelling and numerical methods. His work also encompasses natural language processing (NLP), text processing/recognition, and data science.
By Diana Vyshotravka Victoria Vysotska Zhengbing Hu Dmytro Uhryn Yuriy Ushenko Kyrylo Smelyakov
DOI: https://doi.org/10.5815/ijisa.2025.05.07, Pub. Date: 8 Oct. 2025
This study presents a deep learning-based approach to automated resume and job matching that uses semantic similarity between texts. The solution is based on SimCSE RoBERTa transformer embeddings and a Siamese neural architecture trained using the MSELoss loss function. Unlike traditional filtering systems by keywords or characteristics, the proposed model learns to place semantically compatible pairs (resume-vacancy) in a common vector space. Unlike traditional keyword-based or attributive matching systems, our method is designed to capture deep semantic alignment between resumes and job descriptions. To evaluate the effectiveness of this architecture, we conducted extensive experiments on a labelled dataset of over 7,000 resume–vacancy pairs obtained from the HuggingFace repository. The dataset includes three classes (Good Fit, Potential Fit, No Fit), which we restructured into a binary classification task. Annotation labels reflect textual compatibility based on skills, responsibilities, and experience, ensuring task relevance.
It resulted in a moderately imbalanced dataset with approximately 66% positive and 34% negative examples. Labels were assigned based on semantic compatibility, including skill match, job responsibilities, and experience alignment. Our model achieved accuracy = 72%, precision = 70%, recall = 74%, F1-score = 72%, and Precision@10 = 75%, significantly outperforming both classical (TF-IDF + cosine similarity) and neural (Sentence-BERT without fine-tuning) baselines. These results validate the empirical effectiveness of our architecture for candidate ranking and selection. To justify the use of a complex Siamese architecture, the system was compared to two baselines: (1) a classical TF-IDF + cosine similarity method, and (2) a pretrained Sentence-BERT model without task-specific fine-tuning. The proposed model significantly outperformed both baselines across all evaluation metrics, confirming that its complexity translates to meaningful performance gains. A basic self-learning mechanism is implemented and functional. Recruiters can provide binary feedback (Fit / No Fit) for each recommended candidate, which is stored in a feedback table. This feedback can be used to retrain or fine-tune the model periodically, enabling adaptive behaviour over time. While initial retraining experiments were conducted offline, full automation and continuous integration of feedback into training pipelines remain a goal for future development. The system offers sub-5-second response times, integration with vector databases, and a web-based user interface. It is designed for use in HR departments, recruiting agencies, and employment platforms, with potential for broader commercial deployment and domain adaptation. We additionally implemented a feedback-driven retraining loop that enables future self-supervised adaptation. While UI and vector retrieval infrastructure were developed to support prototyping and deployment, the primary research innovation centres on the modelling framework, learning setup, and comparative evaluation methodology. This work contributes to the advancement of semantically-aware intelligent recruiting systems and offers a replicable baseline for future studies in neural recommendation for HR applications. The risks of algorithmic bias are emphasised separately: even in the absence of obvious demographic characteristics in the input data, the model can implicitly reproduce social or historical inequalities inherent in the data. In this regard, the study outlines areas for further development, in particular equity auditing, bias reduction techniques, and the integration of human validation in decision-making.
By Roman Lynnyk Victoria Vysotska Zhengbing Hu Dmytro Uhryn Liliia Diachenko Kyrylo Smelyakov
DOI: https://doi.org/10.5815/ijitcs.2025.04.07, Pub. Date: 8 Aug. 2025
The article presents a modern approach to analysing public opinion based on Ukrainian-language content from Telegram channels. This study presents a hybrid clustering approach that combines DBSCAN and K-means algorithms to analyse vectorised Ukrainian-language social media posts in order to detect public opinion trends. The methodology relies on a multilingual neural network–based text vectorisation model, which enables effective representation of the semantic content of posts. Experiments conducted on a corpus of 90 Ukrainian-language messages (collected between March and May 2025) allowed for the identification of six principal thematic clusters reflecting key areas of public discourse. Despite the small volume of the corpus (90 messages), the sample is structured and balanced by topic (news, vacancies, gaming), which allows you to test the effectiveness of the proposed methodology in conditions of limited data. This approach is appropriate in the case of the analysis of short texts in low-resource languages, where large-scale corpora are not available. A special advantage of this approach is the use of semantic vector representation and the construction of graphs of lexical co-occurrence networks (term co-occurrence networks), which demonstrate a stable topological structure even with small amounts of data. It allows you to identify latent topic patterns and coherent clusters that have the potential to scale to broader corpora. The authors acknowledge the limitations associated with sample size, but emphasise the role of this study as a pilot stage for the development of a universal, linguistically adaptive method for analysing public discourse. In the future, it is planned to expand the body to increase the representativeness and accuracy of the conclusions. The paper proposes a hybrid method of automatic thematic cluster analysis of short texts in social media, in particular Telegram. Vectorisation of Ukrainian-language messages is implemented using the transformer model multilingual-e5-large-instruct. A combination of HDBSCAN and K-means algorithms was used to detect clusters. More than 36,000 messages from three Telegram channels (news, games, vacancies) were analysed, and six main thematic clusters were identified. To identify thematic trends, a hybrid clustering approach was used, in which the HDBSCAN algorithm was used at the first stage to identify dense clusters and identify "noise" points, after which K-means were used to reclassify residual ("noise") embeddings to the nearest cluster centres.
Such a two-tier strategy allows you to combine the advantages of flexible allocation of free-form clusters from HDBSCAN and stable classification of less pronounced groups through K-means. It is especially effective when working with fragmented short texts of social networks. To validate the quality of clustering, both visualisation tools (PCA, t-SNE, word clouds) and quantitative metrics were used: Silhouette Score (0.41) and Davis-Boldin index (0.78), which indicate moderate coherence and resolution of clusters. Separately, the high level of "noise" in HDBSCAN (34.2%) was analysed, which may be due to the specifics of short texts, model parameters, or stylistic fragmentation of Telegram messages. The results obtained show the effectiveness of combining modern vectorisation models with flexible clustering methods to identify structured topics in fragmented Ukrainian-language content of social networks. The proposed approach has the potential to further expand to other sources, types of discourse, and tasks of digital sociology. As a result of processing 90 messages received from three different channels (news, gaming content, and vacancies), six main thematic clusters were identified. The largest share is occupied by clusters related to employment (28.2%) and security-patriotic topics (24.7%). The average level of "noise" after the initial HDBSCAN clustering was 34.2%. Additional analysis revealed that post lengths varied significantly, ranging from short announcements (average of 10 words) to analytical texts (over 140 words). Visualisations (timelines, PCA, t-SNE, word clouds, term co-occurrence graphs) confirm the thematic coherence of clusters and reveal changes in thematic priorities over time. The proposed system is an effective tool for detecting information trends in the environment of short, fragmented texts and can be used to monitor public sentiment in low-resource languages.
Subscribe to receive issue release notifications and newsletters from MECS Press journals