Diana Vyshotravka

Work place: Department of Information Systems and Networks, Institute of Computer Sciences and Information Technologies, Lviv Polytechnic National University, Lviv, 79013, Ukraine

E-mail: vyshotravka.diana.sa.2021@lpnu.ua

Website:

Research Interests: Data Analysis

Biography

Diana Vyshotravka is a final year bachelor's student at Lviv Polytechnic National University, Department of Information Systems and Networks. She is interested in data analysis and building machine and deep learning models to solve various problems.

Author Articles
Smart Application for Recruiting Based on Natural Language Processing Methods, Transformer Models and Siamese Neural Network Architecture

By Diana Vyshotravka Victoria Vysotska Zhengbing Hu Dmytro Uhryn Yuriy Ushenko Kyrylo Smelyakov

DOI: https://doi.org/10.5815/ijisa.2025.05.07, Pub. Date: 8 Oct. 2025

This study presents a deep learning-based approach to automated resume and job matching that uses semantic similarity between texts. The solution is based on SimCSE RoBERTa transformer embeddings and a Siamese neural architecture trained using the MSELoss loss function. Unlike traditional filtering systems by keywords or characteristics, the proposed model learns to place semantically compatible pairs (resume-vacancy) in a common vector space. Unlike traditional keyword-based or attributive matching systems, our method is designed to capture deep semantic alignment between resumes and job descriptions. To evaluate the effectiveness of this architecture, we conducted extensive experiments on a labelled dataset of over 7,000 resume–vacancy pairs obtained from the HuggingFace repository. The dataset includes three classes (Good Fit, Potential Fit, No Fit), which we restructured into a binary classification task. Annotation labels reflect textual compatibility based on skills, responsibilities, and experience, ensuring task relevance.  
It resulted in a moderately imbalanced dataset with approximately 66% positive and 34% negative examples. Labels were assigned based on semantic compatibility, including skill match, job responsibilities, and experience alignment. Our model achieved accuracy = 72%, precision = 70%, recall = 74%, F1-score = 72%, and Precision@10 = 75%, significantly outperforming both classical (TF-IDF + cosine similarity) and neural (Sentence-BERT without fine-tuning) baselines. These results validate the empirical effectiveness of our architecture for candidate ranking and selection. To justify the use of a complex Siamese architecture, the system was compared to two baselines: (1) a classical TF-IDF + cosine similarity method, and (2) a pretrained Sentence-BERT model without task-specific fine-tuning. The proposed model significantly outperformed both baselines across all evaluation metrics, confirming that its complexity translates to meaningful performance gains. A basic self-learning mechanism is implemented and functional. Recruiters can provide binary feedback (Fit / No Fit) for each recommended candidate, which is stored in a feedback table. This feedback can be used to retrain or fine-tune the model periodically, enabling adaptive behaviour over time. While initial retraining experiments were conducted offline, full automation and continuous integration of feedback into training pipelines remain a goal for future development. The system offers sub-5-second response times, integration with vector databases, and a web-based user interface. It is designed for use in HR departments, recruiting agencies, and employment platforms, with potential for broader commercial deployment and domain adaptation. We additionally implemented a feedback-driven retraining loop that enables future self-supervised adaptation. While UI and vector retrieval infrastructure were developed to support prototyping and deployment, the primary research innovation centres on the modelling framework, learning setup, and comparative evaluation methodology. This work contributes to the advancement of semantically-aware intelligent recruiting systems and offers a replicable baseline for future studies in neural recommendation for HR applications. The risks of algorithmic bias are emphasised separately: even in the absence of obvious demographic characteristics in the input data, the model can implicitly reproduce social or historical inequalities inherent in the data. In this regard, the study outlines areas for further development, in particular equity auditing, bias reduction techniques, and the integration of human validation in decision-making.

[...] Read more.
Other Articles