IJWMT Vol. 16, No. 3, 8 Jun. 2026
Cover page and Table of Contents: PDF (size: 1357KB)
PDF (1357KB), PP.160-171
Views: 0 Downloads: 0
Multicollinearity, effects of multicollinearity, correlation analysis, Variance Inflation Factor, Condition Index, Principal Component Analysis, Receiver Operating Characteristic
It is known that multicollinearity not only leads to the generation of redundant data as a result of data repetition, but also affects the stability of linear models of artificial intelligence and the reliability of results. The negative effects of multicollinearity can be seen especially clearly in the development of mathematical models of artificial intelligence algorithms. That is, the coefficients will be unstable in a mathematical model developed on the basis of a data set with multicollinearity. As a result of it, misconceptions arise in scientific conclusions drawn based on the coefficients. This article first discusses multicollinearity and its negative consequences in detail. In addition to, methods for determining multicollinearity in a data set based on the correlation coefficient, the variance inflation coefficient, and the condition index are discussed in detail. Moreover, this research paper analyzes the methods of eliminating multicollinearity by removing, combining features, and Principal Component Analysis. At the same time, the research will investigate the impact of multicollinearity on machine learning models such as LogisticRegression, LinearRegression, LinearSVC, and XGBClassifier using a multicollinearity dataset. The results of the study showed that eliminating multicollinearity leads to an increase in the accuracy of all considered artificial intelligence models. In particular, the ROC value increased by 0.102 in the Logistic Regression model, by 0.129 in the Ridge Classifier, and by 0.121 in the Linear SVC. Although the smallest difference value of 0.094 was achieved in the XGBoost model, the accuracy was higher than that of the other models. After the experimental results, the article presents conclusions and recommendations based on the results obtained.
Akbar E. Rashidov, Hari Mohan Rai, Nurlan M. Tursinhanov, Sherzod A. Tursunov, "Impact of Reducing Multicollinearity in a Dataset on Artificial Intelligence Algorithms", International Journal of Wireless and Microwave Technologies(IJWMT), Vol.16, No.3, pp. 160-171, 2026. DOI:10.5815/ijwmt.2026.03.11
[1]S. Mohammed, L. Budach, M. Feuerpfeil, N. Ihde, A. Nathansen, N. Noack, H. Patzlaff, F. Naumann, and H. Harmouch, “The effects of data quality on machine learning performance on tabular data,” Information Systems, vol. 132, 2025, art. no. 102549, doi: 10.1016/j.is.2025.102549.
[2]D. Junaydullaev, S. Tursunov, and A. Rashidov, “An approach based on data profiling at the preparing a dataset for cleaning,” in Proc. Int. Russian Smart Industry Conf. (SmartIndustryCon), 2025, pp. 578–583, doi: 10.1109/SmartIndustryCon65166.2025.10986179.
[3]J. Y.-L. Chan, S. M. H. Leow, K. T. Bea, W. K. Cheng, S. W. Phoong, Z.-W. Hong, and Y.-L. Chen, “Mitigating the multicollinearity problem and its machine learning approach: A review,” Mathematics, vol. 10, 2022, art. no. 1283, doi: 10.3390/math10081283.
[4]I. Aminov, D. Hodjaeva, and L. Xuramov, “Processing of signals received from sensors using the extended Kalman filter and the RLS filter,” in Proc. Int. Ural Conf. on Electrical Power Engineering (UralCon), 2025, pp. 568–573, doi: 10.1109/UralCon67204.2025.11206648.
[5]J. H. Kim, “Multicollinearity and misleading statistical results,” Korean J. Anesthesiol., vol. 72, no. 6, pp. 558–569, 2019, doi: 10.4097/kja.19087.
[6]El-Sheikh, A. A., Hassan, M. A., & Ahmed, S. M., “Development of Two Methods for Estimating High-Dimensional Data in the Case of Multicollinearity and Outliers,” International Journal of Analysis and Applications, vol. 22, no. 3, pp. 1–15, 2024. DOI: 10.28924/2291-8639-22-2024-3393.
[7]M. Craglia, J. Hradec, and X. Troussard, “The big data and artificial intelligence: Opportunities and challenges to modernise the policy cycle,” in Science for Policy Handbook, V. Šucha and M. Sienkiewicz, Eds. Amsterdam, Netherlands: Elsevier, 2020, pp. 96–103, doi: 10.1016/B978-0-12-822596-7.00009-7.
[8]A. E. Rashidov, A. R. Akhatov, F. M. Nazarov, and I. N. Turakulov, “Automation of data flow management based on artificial intelligence in systems with an internal distribution mechanism,” in Handbook of Intelligent Automation Systems Using Computer Vision and Artificial Intelligence, R. Gill, S. Hooda, D. Srivastava, and S. Harnal, Eds. Hoboken, NJ, USA: Wiley, 2025, ch. 4, doi: 10.1002/9781394302734.ch4.
[9]C. Yoo and E. Cho, “Effect of multicollinearity on the bivariate frequency analysis of annual maximum rainfall events,” Water, vol. 11, 2019, art. no. 905, doi: 10.3390/w11050905.
[10]A. R. Akhatov, S. S. Kenjaev, and M. R. Tojiev, “Improved round robin algorithm based on fuzzy logic and genetic algorithm for server load balancing,” in Proc. Int. Russian Automation Conf. (RusAutoCon), 2025, pp. 296–301, doi: 10.1109/RusAutoCon65989.2025.11177403.
[11]E. Lopez, G. Gorla, J. Etxebarria-Elezgarai, J. M. Amigo, and A. Seifert, “The importance of choosing a proper validation strategy in predictive models. Part 2: Recipes for (avoiding) overfitting—A tutorial,” Analytica Chimica Acta, vol. 1384, 2026, art. no. 344838, doi: 10.1016/j.aca.2025.344838
[12]Owoyemi, Q. A., Adeyemi, O. A., & Salawu, M. K., “Comparative Analysis of Linear Predictive Models in the Presence of Multicollinearity,” International Journal of Applied Statistics and Probability, vol. 13, no. 1, pp. 45–56, 2024. DOI: 10.14419/ijasp.v13i1.32864.
[13]A. F. Lukman, S. Mohammed, O. Olaluwoye, and R. A. Farghali, “Handling multicollinearity and outliers in logistic regression using the robust Kibria–Lukman estimator,” Axioms, vol. 14, 2025, art. no. 19, doi: 10.3390/axioms14010019.
[14]J. I. Daoud, “Multicollinearity and regression analysis,” J. Phys.: Conf. Ser., vol. 949, 2018, art. no. 012009, doi: 10.1088/1742-6596/949/1/012009.
[15]F. Nazarov, A. Rashidov, and S. Yarmatov, “Determining the number of effective distributions based on neural network ensemble,” Int. J. Intelligent Systems and Applications, vol. 17, no. 4, pp. 69–77, 2025, doi: 10.5815/ijisa.2025.04.07.
[16]D. Wheeler and M. Tiefelsdorf, “Multicollinearity and correlation among local regression coefficients in geographically weighted regression,” J. Geographical Systems, vol. 7, pp. 161–187, 2005, doi: 10.1007/s10109-005-0155-6.
[17]R. Salmerón-Gómez, C. B. García-García, and J. García-Pérez, “A redefined variance inflation factor: Overcoming the limitations of the variance inflation factor,” Computational Economics, vol. 65, pp. 337–363, 2025, doi: 10.1007/s10614-024-10575-8.
[18]Smith, J., Brown, T., & Lee, K., “Effects of Multicollinearity and Data Granularity on Regression Models,” Journal of Hydrology, vol. 630, pp. 129–140, 2024. DOI: 10.1016/j.jhydrol.2024.129140.
[19]C. G. Thompson, R. S. Kim, A. M. Aloe, and B. J. Becker, “Extracting the variance inflation factor and other multicollinearity diagnostics from typical regression results,” Basic and Applied Social Psychology, vol. 39, no. 2, pp. 81–90, 2017, doi: 10.1080/01973533.2016.1277529.
[20]J. West, M. Siddhpura, A. Evangelista, and A. Haddad, “Emergence of AI—Impact on building condition index (BCI),” Buildings, vol. 14, 2024, art. no. 3868, doi: 10.3390/buildings14123868.
[21]W. Jiao, L. Zhang, Q. Chang, D. Fu, Y. Cen, and Q. Tong, “Evaluating an enhanced vegetation condition index (VCI) based on VIUPD for drought monitoring in the continental United States,” Remote Sensing, vol. 8, 2016, art. no. 224, doi: 10.3390/rs8030224.
[22]T. Konishi, “Means and issues for adjusting principal component analysis results,” Algorithms, vol. 18, 2025, art. no. 129, doi: 10.3390/a18030129.
[23]“Default of credit card clients dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset
[24]S. Yarmatov, H. Rai, G. Absalamova, and D. Absalamova, “Real estate valuation based on XGBoost,” in Proc. 8th Int. Conf. Future Networks & Distributed Systems (ICFNDS), 2024, pp. 705–709, doi: 10.1145/3726122.3726223.
[25]Dong, H., Zhang, Y., & Li, X., “Accuracy Comparison between Machine Learning Algorithms for Financial Risk Prediction,” Journal of Risk and Financial Management, vol. 17, no. 2, pp. 50–65, 2024. DOI: 10.3390/jrfm17020050.
[26]Akbar Soliev, Akhatov, A. & Rashidov, A. Methods of Anomalous Data Detection in Datasets. Opt. Mem. Neural Networks 34 (Suppl 3), S514–S521 (2025). https://doi.org/10.3103/S1060992X25603045
[27]A. Bobokhonov, L. Xuramov and A. Rashidov, "Evaluation of the Effectiveness of Interpolation Methods in the Process of Image Size Standardization," 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russian Federation, 2025, pp. 165-170, doi: 10.1109/RusAutoCon65989.2025.11177409
[28]Wang, L., Chen, Z., & Kumar, R., “Development and Validation of Machine Learning Models Using ROC and VIF Analysis,” Frontiers in Artificial Intelligence, vol. 7, pp. 1449064, 2024. DOI: 10.3389/frai.2024.1449064