Igor V. Malyk

Work place: Department of Mathematical Problems of Control and Cybernetics, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58000, Ukraine

E-mail: i.malyk@chnu.edu.ua

Website:

Research Interests: Deep Learning

Biography

Igor V. Malyk was born in Ukraine on July 19, 1983. In 2005 he graduated from Yuriy Fedkovych Chernivtsi National University (Chernivtsy, Ukraine) by specialty statistics. In 2009 he defended his PhD thesis at the Institute of Cybernetics (Kiev, Ukraine) by specialty theoretical foundations of computer science and cybernetics. In 2018 he defended his doctoral thesis at the Institute of Cybernetics (Kiev, Ukraine) by specialty theoretical foundations of computer science and cybernetics
From 2005 to 2017, he worked as an assistant, and later as an associate professor at the Department of Systems Analysis and Insurance and Financial Mathematics (Yuriy Fedkovych Chernivtsi National University, Chernivtsi, Ukraine). Since September 2017, he has been working as an associate professor at the Department of Mathematical Problems of Control and Cybernetics (Yuriy Fedkovych Chernivtsi National University, Chernivtsi,
Ukraine), and since 2024, he has been a full professor and head of the department. Research interests are self-adaptive algorithms in machine learning, random processes, stochastic differential equations, deep learning, neural networks.
Prof. Malyk is membership of CHEST Journal editorial team (statistical reviewer); Multidisciplinary Digital Publishing Institute (MDPI) editorial team; American Mathematical Society and International Society editorial team for Computational Biology(ISCB).

Author Articles
A Hybrid Active and Semi-Supervised Learning Framework for Classification with Minimal Labeled Data

By Kostiantyn O. Minkov Igor V. Malyk

DOI: https://doi.org/10.5815/ijisa.2026.03.05, Pub. Date: 8 Jun. 2026

Modern machine learning models typically require large amounts of precisely labeled data to perform effectively. However, obtaining such labels is time-consuming and costly, especially in specialized domains such as medical image analysis and document classification, where unlabeled data is abundant but expert annotation is scarce. This paper addresses the problem of learning from very few labeled examples by jointly leveraging weak supervision, active learning (AL), and semi-supervised learning (SSL). A hybrid framework is proposed in which a small set of informative samples is actively selected for manual annotation using an entropy-based acquisition function combined with weak label disagreement scoring, while a large pool of unlabeled or weakly labeled data is exploited through SSL based on the FixMatch algorithm. The approach iteratively corrects noisy labels and refines the model with minimal human involvement. The framework is evaluated using a ResNet-18 classifier on the CIFAR-10 benchmark dataset and is compared against two baselines: pure active learning and pure semi-supervised learning. Each method is run independently across three random seeds at the key active learning rounds, and accuracy is reported as mean ± standard deviation. Across three independent seeds, the hybrid framework consistently leads both baselines at intermediate labelling budgets, with the largest absolute gap at Round 15 (+1.27 percentage points over pure active learning, +1.35 percentage points over pure SSL). The framework also offers a clear label-efficiency advantage: at Round 15, with |D_L | = 6500 labels, the hybrid method already reaches 0.6792 ± 0.0097 test accuracy – exceeding the accuracies that pure active learning (0.6730 ± 0.0139) and pure SSL (0.6687 ± 0.0056) attain only at Round 20 with |D_L | = 7000. By Round 20 all three methods saturate near a common data ceiling, indicating that the integrated use of weak supervision, active learning, and consistency-based SSL is most valuable when the annotation budget is genuinely constrained.

[...] Read more.
Data Optimization through Compression Methods Using Information Technology

By Igor V. Malyk Yevhen Kyrychenko Mykola Gorbatenko Taras Lukashiv

DOI: https://doi.org/10.5815/ijitcs.2025.05.07, Pub. Date: 8 Oct. 2025

Efficient comparison of heterogeneous tabular datasets is difficult when sources are unknown or weakly documented. We address this problem by introducing a unified, type-aware framework that builds compact data represen- tations (CDRs)—concise summaries sufficient for downstream analysis—and a corresponding similarity graph (and tree) over a data corpus. Our novelty is threefold: (i) a principled vocabulary and procedure for constructing CDRs per variable type (factor, time, numeric, string), (ii) a weighted, type-specific similarity metric we call Data Information Structural Similarity (DISS) that aggregates distances across heterogeneous summaries, and (iii) an end-to-end, cloud-scalable real- ization that supports large corpora. Methodologically, factor variables are summarized by frequency tables; time variables by fixed-bin histograms; numeric variables by moment vectors (up to the fourth order); and string variables by TF–IDF vectors. Pairwise similarities use Hellinger, Wasserstein (p=1), total variation, and L1/L2 distances, with MAE/MAPE for numeric summaries; the DISS score combines these via learned or user-set weights to form an adjacency graph whose minimum-spanning tree yields a similarity tree. In experiments on multi-source CSVs, the approach enables accurate retrieval of closest datasets and robust corpus-level structuring while reducing storage and I/O. This contributes a repro- ducible pathway from raw tables to a similarity tree, clarifying terminology and providing algorithms that practitioners can deploy at scale.

[...] Read more.
Other Articles