Markiian-Mykhailo Paprotskyi

Work place: Department of Information Systems and Networks, Lviv Polytechnic National University, Lviv, 79013, Ukraine

E-mail: markiian-mykhailo.paprotskyi.sa.2021@lpnu.ua

Website: https://orcid.org/0009-0002-5292-8559

Research Interests:

Biography

Markiian-Mykhailo Paprotskyi is a bright student pursuing a Bachelor's degree at the Department of Information Systems and Networks at Lviv Polytechnic National University. He is a beginning researcher with a passion for data science, in particular modern technologies such as machine learning and deep learning, and large language models (LLM).

Author Articles
Information Engineering for Fake Job Postings Classification in Electronic Business Based on Machine Learning Technology

By Markiian-Mykhailo Paprotskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijieeb.2025.05.07, Pub. Date: 8 Oct. 2025

This study investigates the application of machine learning methods for the classification of fraudulent job postings in e-business platforms. Using the publicly available fake_job_postings.csv dataset, textual and categorical features of vacancies were processed and vectorised through TF-IDF, HashingVectorizer, and optimised TF-IDF. Eight machine learning algorithms were compared, including Logistic Regression, Random Forest, Gradient Boosting, Decision Tree, Multinomial Naive Bayes, Linear SVC, K-Nearest Neighbours, and XGBoost. The experiments demonstrate that XGBoost achieved the best performance (Accuracy = 0.990, Precision = 0.982, Recall = 0.998, F1 = 0.990) across all feature representations. Its superior results can be attributed to the ability of boosted ensembles to capture complex non-linear relationships in high-dimensional feature spaces while maintaining robustness against noise and class imbalance.
However, it should be noted that the evaluation was performed on a single static dataset. While the high recall shows the model’s ability to reliably detect fraudulent ads in this context, questions remain about its generalisability. Fraud tactics evolve rapidly, and new job scams may significantly differ from patterns in the training data. This creates  a potential risk of overfitting to dataset-specific features, which limits direct transfer to real-world scenarios without continuous retraining and monitoring. The practical contribution of the study is a reproducible framework that integrates text and categorical processing, vectorisation, hyperparameter optimisation, and comparative model benchmarking. Such a framework could be embedded into online job platforms to support automated filtering of suspicious ads. Still, its deployment requires additional measures: periodic retraining with updated data, integration with platform APIs, and the inclusion of explainability modules to ensure transparency and user trust. Overall, the research demonstrates that ensemble-based models, particularly XGBoost, offer strong potential for fraud detection in the e-business labour market. At the same time, further work is necessary to validate model robustness on unseen and evolving fraudulent job posting strategies, ensuring scalability and reliability in production environments.

[...] Read more.
Local Agentic RAG-Based Information System Development for Intelligent Analysis of GitHub Code Repositories in Computer Science Education

By Zhengbing Hu Markiian-Mykhailo Paprotskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijmecs.2025.05.07, Pub. Date: 8 Oct. 2025

This study presents the development and evaluation of a local agent-based Retrieval-Augmented Generation (Agentic RAG) system designed for the intelligent analysis of GitHub repositories in computer science education and IT practice. The novelty of this work lies not in inventing a new RAG algorithm, but in orchestrating multiple existing components (LangChain, Redis, SentenceTransformer, and LLMs) into a multi-stage agent pipeline with integrated relevance evaluation, specifically adapted to offline repository mining. The proposed pipeline consists of four sequential stages: (1) query reformulation by a dedicated LLM agent, (2) semantic retrieval using SentenceTransformer embeddings stored in Redis, (3) response generation by a second LLM, and (4) relevance scoring through a verification agent with retry logic. Relevance is assessed via cosine similarity and LLM-based scoring, allowing iterative refinement of answers. Experimental testing compared the system against two baselines: keyword search and a non-agentic single-stage RAG pipeline. Results showed an average MRR@10 of 0.72, compared to 0.48 for keyword search and 0.61 for non-agentic RAG, representing a 33% relative improvement in retrieval quality. Human evaluators (n=15, computer science students) rated generated explanations on a 5-point Likert scale; the proposed system achieved an average 4.3/5 for clarity and correctness, compared to 3.6/5 for the baseline. Precision@5 for code retrieval improved from 0.54 (keyword) and 0.67 (non-agentic RAG) to 0.76 in the proposed system. Average query latency in the local environment was 3.8 seconds, indicating acceptable performance for educational and small-team IT use cases. The system demonstrates high autonomy by operating fully on-premises with only optional API access to LLMs, ensuring privacy and independence from cloud providers. Ease of use was measured through a System Usability Scale (SUS) questionnaire, yielding a score of 78/100, reflecting positive user perception of the Streamlit interface and minimal setup requirements. Nevertheless, several limitations were observed: the high computational cost of running embeddings and LLMs locally, potential hallucinations in generated explanations (particularly for complex or unfamiliar code), and the inability of vector search to fully capture code syntax and control flow structures. Furthermore, while the Analytic Hierarchy Process (AHP) was applied to select the system architecture, future work should complement this with benchmark-driven evaluations for greater objectivity. The contribution of this study is threefold: (1) introducing a multi-agent orchestration logic tailored to educational code repositories; (2) empirically demonstrating measurable gains in retrieval quality and explanation usefulness over baselines; and (3) highlighting both opportunities and limitations of deploying autonomous RAG systems locally. The proposed technology can benefit IT companies seeking secure in-house tools for repository analysis, universities aiming to integrate intelligent assistants into programming courses, and research institutions requiring reproducible, privacy-preserving environments for code exploration.

[...] Read more.
Other Articles