Kostiantyn O. Minkov; Igor V. Malyk

A Hybrid Active and Semi-Supervised Learning Framework for Classification with Minimal Labeled Data

PDF (951KB), PP.81-91

Views: 0 Downloads: 0

Author(s)

Kostiantyn O. Minkov ^1,* Igor V. Malyk ¹

1. Department of Mathematical Problems of Control and Cybernetics, Chernivtsi National University, Chernivtsi, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.03.05

Received: 26 Jan. 2026 / Revised: 17 Mar. 2026 / Accepted: 25 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

Active Learning, Semi-Supervised Learning, Neural Networks, Classification, Machine Learning, Data Analysis, Model, Accuracy

Abstract

Modern machine learning models typically require large amounts of precisely labeled data to perform effectively. However, obtaining such labels is time-consuming and costly, especially in specialized domains such as medical image analysis and document classification, where unlabeled data is abundant but expert annotation is scarce. This paper addresses the problem of learning from very few labeled examples by jointly leveraging weak supervision, active learning (AL), and semi-supervised learning (SSL). A hybrid framework is proposed in which a small set of informative samples is actively selected for manual annotation using an entropy-based acquisition function combined with weak label disagreement scoring, while a large pool of unlabeled or weakly labeled data is exploited through SSL based on the FixMatch algorithm. The approach iteratively corrects noisy labels and refines the model with minimal human involvement. The framework is evaluated using a ResNet-18 classifier on the CIFAR-10 benchmark dataset and is compared against two baselines: pure active learning and pure semi-supervised learning. Each method is run independently across three random seeds at the key active learning rounds, and accuracy is reported as mean ± standard deviation. Across three independent seeds, the hybrid framework consistently leads both baselines at intermediate labelling budgets, with the largest absolute gap at Round 15 (+1.27 percentage points over pure active learning, +1.35 percentage points over pure SSL). The framework also offers a clear label-efficiency advantage: at Round 15, with |D_L | = 6500 labels, the hybrid method already reaches 0.6792 ± 0.0097 test accuracy – exceeding the accuracies that pure active learning (0.6730 ± 0.0139) and pure SSL (0.6687 ± 0.0056) attain only at Round 20 with |D_L | = 7000. By Round 20 all three methods saturate near a common data ceiling, indicating that the integrated use of weak supervision, active learning, and consistency-based SSL is most valuable when the annotation budget is genuinely constrained.

Cite This Paper

Kostiantyn O. Minkov, Igor V. Malyk, "A Hybrid Active and Semi-Supervised Learning Framework for Classification with Minimal Labeled Data", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.3, pp.81-91, 2026. DOI:10.5815/ijisa.2026.03.05

Reference

[1]S. Budd, E. C. Robinson, and B. Kainz, “A survey on active learning and human-in-the-loop deep learning for medical image analysis,” Medical Image Analysis, vol. 71, p. 102062, 2021.
[2]L. Hu et al., “Active Learning for Text Classification with Deep Neural Networks,” in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
[3]J. Gao et al., “Consistency-based Semi-supervised Active Learning for Object Detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020.
[4]A. Ratner et al., “Snorkel: Rapid training data creation with weak supervision,” in Proc. VLDB Endowment, vol. 11, no. 3, pp. 269–282, 2017.
[5]V. C. Raykar et al., “Learning from Crowds,” Journal of Machine Learning Research, vol. 11, pp. 1297–1322, 2010.
[6]M. Ren et al., “Learning to Reweight Examples for Robust Deep Learning,” in International Conference on Machine Learning (ICML), pp. 4334–4343, 2018.
[7]K. Sohn et al., “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 596–608, 2020.
[8]A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[9]S. Sinha, H. Ebrahimi, and T. Darrell, “Variational Adversarial Active Learning,” in Proc. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5972–5981, 2019.
[10]E. Song et al., “Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels,” arXiv preprint arXiv:1912.00594, 2019.
[11]B. Settles, “Active Learning Literature Survey,” Computer Sciences Technical Report 1648, University of Wisconsin– Madison, 2009.
[12]G. Zheng, A. H. Awadallah, and S. Dumais, “Meta Label Correction for Noisy Label Learning,” in Proc. AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11053–11061, 2021.
[13]W. Ma, O. Karakus, and P. L. Rosin, “Integrating Semi-Supervised and Active Learning for Semantic Segmentation,” arXiv preprint arXiv:2501.19227, 2025.
[14]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[15]C. Szegedy et al., “Rethinking the Inception Architecture for Computer Vision,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016.
[16]A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” Technical Report, University of Toronto, 2009.
[17]J. Jimenez et al., “Computational Evaluation of the Combination of Semi-Supervised and Active Learning for Histopathology Image Segmentation with Missing Annotations,” in Proc. ICCVW, 2023.
[18]D. Levkivskyi, V. Vysotska, L. Chyrun, Y. Ushenko, D. Uhryn, and C. Hu, “Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods,” International Journal of Information Engineering and Electronic Business (IJIEEB), vol. 17, no. 2, pp. 1–50, 2025. doi:10.5815/ijieeb.2025.02.01.
[19]V. Vysotska, K. Przystupa, Y. Kulikov, S. Chyrun, Y. Ushenko, Z. Hu, and D. Uhryn, “Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content Based on NLP and Machine-Learning Technology,” In- ternational Journal of Computer Network and Information Security (IJCNIS), vol. 17, no. 1, pp. 92–127, 2025. doi:10.5815/ijcnis.2025.01.08.
[20]V. Vysotska, K. Przystupa, L. Chyrun, S. Vladov, Y. Ushenko, D. Uhryn, and Z. Hu, “Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods,” In- ternational Journal of Computer Network and Information Security (IJCNIS), vol. 16, no. 5, pp. 57–85, 2024. doi:10.5815/ijcnis.2024.05.06.
[21]W. Huang, Z. Xiong, C. Liu, and X. X. Zhu, “Hierarchical Semi-Supervised Active Learning for Remote Sensing,” arXiv preprint arXiv:2511.18058, 2025.
[22]H. Roda and A. B. Geva, “Semi-supervised active learning using convolutional auto-encoder and contrastive learn- ing,” Frontiers in Artificial Intelligence, vol. 7, p. 1398844, 2024. doi:10.3389/frai.2024.1398844.
[23]M. Li and C. Zhu, “Noisy Label Processing for Classification: A Survey,” arXiv preprint arXiv:2404.04159, 2024.

International Journal of Intelligent Systems and Applications (IJISA)