Work place: Department of Mathematical Problems of Control and Cybernetics, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58000, Ukraine
E-mail: kyrychenko.yevhen@chnu.edu.ua
Website:
Research Interests:
Biography
Yevhen Kyrychenko was born in Ukraine on October 23, 1998. He received the B.Sc. and M.Sc. degrees in computer science from Ivan Franko National University of Lviv, Ukraine, in 2020 and 2022, respectively. He is currently pursuing the Ph.D. degree in software engineering at the Department of Software Engineering, Yuriy Fedkovych Chernivtsi National University, Ukraine. His major field of study includes cloud computing, big data technologies, and distributed systems.
He is currently working as a Senior Software Engineer at EPAM Systems, Ukraine. He has been involved in research and development projects related to cloud-native architectures, scalable data processing, and distributed computing. His research has been published in peer-reviewed conference proceedings. His current research interests include distributed data processing systems, big data analytics, and cloud infrastructure optimization.
By Igor V. Malyk Yevhen Kyrychenko Mykola Gorbatenko Taras Lukashiv
DOI: https://doi.org/10.5815/ijitcs.2025.05.07, Pub. Date: 8 Oct. 2025
Efficient comparison of heterogeneous tabular datasets is difficult when sources are unknown or weakly documented. We address this problem by introducing a unified, type-aware framework that builds compact data represen- tations (CDRs)—concise summaries sufficient for downstream analysis—and a corresponding similarity graph (and tree) over a data corpus. Our novelty is threefold: (i) a principled vocabulary and procedure for constructing CDRs per variable type (factor, time, numeric, string), (ii) a weighted, type-specific similarity metric we call Data Information Structural Similarity (DISS) that aggregates distances across heterogeneous summaries, and (iii) an end-to-end, cloud-scalable real- ization that supports large corpora. Methodologically, factor variables are summarized by frequency tables; time variables by fixed-bin histograms; numeric variables by moment vectors (up to the fourth order); and string variables by TF–IDF vectors. Pairwise similarities use Hellinger, Wasserstein (p=1), total variation, and L1/L2 distances, with MAE/MAPE for numeric summaries; the DISS score combines these via learned or user-set weights to form an adjacency graph whose minimum-spanning tree yields a similarity tree. In experiments on multi-source CSVs, the approach enables accurate retrieval of closest datasets and robust corpus-level structuring while reducing storage and I/O. This contributes a repro- ducible pathway from raw tables to a similarity tree, clarifying terminology and providing algorithms that practitioners can deploy at scale.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals