Work place: Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine
Website: https://orcid.org/ 0000-0003-4858-4511
Research Interests: Geographic Information System, Decision Support System, Data Mining, Information Systems, Systems Architecture, Artificial Intelligence
Dmytro Uhryn graduated from Yuriy Fedkovych Chernivtsi National University, Chernivtsi. Currently, he is Doctor of Technical Sciences, associate professor Yuriy Fedkovych Chernivtsi National University. He has currently published more than 120 publications. His research interests are data mining, information technologies for decision support, swarm intelligence systems, industry-specific geographic information systems.
DOI: https://doi.org/10.5815/ijmecs.2024.01.03, Pub. Date: 8 Feb. 2024
The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.[...] Read more.
DOI: https://doi.org/10.5815/ijmecs.2023.04.06, Pub. Date: 8 Aug. 2023
A generalized model of population migration is proposed. On its basis, models of the set of directions of population flows, the duration of migration, which is determined by its nature in time, type and form of migration, are developed. The model of indicators of actual migration (resettlement) is developed and their groups are divided. The results of population migration are described, characterized by a number of absolute and relative indicators for the purpose of regression analysis of data. To obtain the results of migration, the author takes into account the power of migration flows, which depend on the population of the territories between which the exchange takes place and on their location on the basis of the coefficients of the effectiveness of migration ties and the intensity of migration ties. The types of migration intensity coefficients depending on the properties are formed. The lightgbm algorithm for predicting population migration is implemented in the intelligent geographic information system. The migration forecasting system is also capable of predicting international migration or migration between different countries. The significance of conducting this survey lies in the increasing need for accurate and reliable migration forecasts. With globalization and the connectivity of nations, understanding and predicting migration patterns have become crucial for various domains, including social planning, resource allocation, and economic development. Through extensive experimentation and evaluation, developed migration forecasting system has demonstrated results of human migration based on machine learning algorithms. Performance metrics of migration flow forecasting models are investigated, which made it possible to present the results obtained from the evaluation of these models using various performance indicators, including the mean square error (MSE), root mean square error (RMSE) and R-squared (R2). The MSE and RMSE measure the root mean square difference between predicted and actual values, while the R2 represents the proportion of variance explained by the model.[...] Read more.
DOI: https://doi.org/10.5815/ijmecs.2023.03.06, Pub. Date: 8 Jun. 2023
The article develops a technology for finding tweet trends based on clustering, which forms a data stream in the form of short representations of clusters and their popularity for further research of public opinion. The accuracy of their result is affected by the natural language feature of the information flow of tweets. An effective approach to tweet collection, filtering, cleaning and pre-processing based on a comparative analysis of Bag of Words, TF-IDF and BERT algorithms is described. The impact of stemming and lemmatization on the quality of the obtained clusters was determined. Stemming and lemmatization allow for significant reduction of the input vocabulary of Ukrainian words by 40.21% and 32.52% respectively. And optimal combinations of clustering methods (K-Means, Agglomerative Hierarchical Clustering and HDBSCAN) and vectorization of tweets were found based on the analysis of 27 clustering of one data sample. The method of presenting clusters of tweets in a short format is selected. Algorithms using the Levenstein Distance, i.e. fuzz sort, fuzz set and Levenshtein, showed the best results. These algorithms quickly perform checks, have a greater difference in similarities, so it is possible to more accurately determine the limit of similarity. According to the results of the clustering, the optimal solutions are to use the HDBSCAN clustering algorithm and the BERT vectorization algorithm to achieve the most accurate results, and to use K-Means together with TF-IDF to achieve the best speed with the optimal result. Stemming can be used to reduce execution time. In this study, the optimal options for comparing cluster fingerprints among the following similarity search methods were experimentally found: Fuzz Sort, Fuzz Set, Levenshtein, Jaro Winkler, Jaccard, Sorensen, Cosine, Sift4. In some algorithms, the average fingerprint similarity reaches above 70%. Three effective tools were found to compare their similarity, as they show a sufficient difference between comparisons of similar and different clusters (> 20%).
The experimental testing was conducted based on the analysis of 90,000 tweets over 7 days for 5 different weekly topics: President Volodymyr Zelenskyi, Leopard tanks, Boris Johnson, Europe, and the bright memory of the deceased. The research was carried out using a combination of K-Means and TF-IDF methods, Agglomerative Hierarchical Clustering and TF-IDF, HDBSCAN and BERT for clustering and vectorization processes. Additionally, fuzz sort was implemented for comparing cluster fingerprints with a similarity threshold of 55%. For comparing fingerprints, the most optimal methods were fuzz sort, fuzz set, and Levenshtein. In terms of execution speed, the best result was achieved with the Levenshtein method. The other two methods performed three times worse in terms of speed, but they are nearly 13 times faster than Sift4. The fastest method is Jaro Winkler, but it has a 19.51% difference in similarities. The method with the best difference in similarities is fuzz set (60.29%). Fuzz sort (32.28%) and Levenshtein (28.43%) took the second and third place respectively. These methods utilize the Levenshtein distance in their work, indicating that such an approach works well for comparing sets of keywords. Other algorithms fail to show significant differences between different fingerprints, suggesting that they are not adapted to this type of task.
DOI: https://doi.org/10.5815/ijmecs.2023.02.06, Pub. Date: 8 Apr. 2023
A method of choosing swarm optimization algorithms and using swarm intelligence for solving a certain class of optimization tasks in industry-specific geographic information systems was developed considering the stationarity characteristic of such systems. The method consists of 8 stages. Classes of swarm algorithms were studied. It is shown which classes of swarm algorithms should be used depending on the stationarity, quasi-stationarity or dynamics of the task solved by an industry geographic information system. An information model of geodata that consists in a formalized combination of their spatial and attributive components, which allows considering the relational, semantic and frame models of knowledge representation of the attributive component, was developed. A method of choosing optimization methods designed to work as part of a decision support system within an industry-specific geographic information system was developed. It includes conceptual information modeling, optimization criteria selection, and objective function analysis and modeling. This method allows choosing the most suitable swarm optimization method (or a set of methods).[...] Read more.
Subscribe to receive issue release notifications and newsletters from MECS Press journals