IJITCS Vol. 17, No. 6, 8 Dec. 2025
Cover page and Table of Contents: PDF (size: 2697KB)
PDF (2697KB), PP.1-28
Views: 0 Downloads: 0
Public Transport, Interpretable Machine Learning, XGBoost, SHAP, Smart City, Delay Factor Analysis
Delay prediction in urban public transport systems is a critical task for improving operational efficiency and service reliability. While numerous predictive models exist, understanding the relative importance of contributing factors remains a challenge, with traditional approaches often overestimating the impact of stochastic weather conditions. This study proposes an approach that combines predictive modelling and factor analysis based on interpretable machine learning. An eXtreme Gradient Boosting model was developed using a large dataset of operational and meteorological data from a city with approximately one million inhabitants. The model demonstrated high predictive accuracy, explaining 72% of the variance in delays (Coefficient of Determination R²=0.72). Analysis of the model’s feature importance revealed that operational cycles (seasonal, weekly, daily) and spatial context (routes, stops) are the dominant predictors, collectively accounting for over 52% of the model’s total feature importance. Contrary to common assumptions, weather conditions were identified as a powerful secondary, rather than primary, factor. While their cumulative feature importance was substantial (contributing nearly 45%), the model revealed their impact to be highly contextual: the negative effects of adverse weather were significantly amplified during predictable peak operational hours but were minimal otherwise. This research demonstrates how Explainable Artificial Intelligence methods can transform complex predictive models into practical tools, providing a data-driven basis for shifting from reactive management to proactive, evidence-based planning.
Yurii Matseliukh, Vasyl Lytvyn, Zhengbing Hu, Myroslava Bublyk, "Predictive Modelling and Factor Analysis of Public Transport Delays in Smart City Using Interpretable Machine Learning", International Journal of Information Technology and Computer Science(IJITCS), Vol.17, No.6, pp.1-28, 2025. DOI:10.5815/ijitcs.2025.06.01
[1]Y. Fornalchyk, I. Vikovych, Y. Royko, and O. Hrytsun, “Improvement of methods for assessing the effectiveness of dedicated lanes for public transport,” East.-Eur. J. Enterp. Technol., vol. 1, pp. 29–37, 2021. doi: 10.15587/1729-4061.2021.225397.
[2]T.R. Gadekallu, N. Kumar, T. Baker, D. Natarajan, P. Boopathy, and P.K.R. Maddikunta, “Moth–Flame Optimization based ensemble classification for intrusion detection in intelligent transport system for smart cities,” Microprocessors and Microsystems, vol. 103, p. 104935, 2023. doi: 10.1016/j.micpro.2023.104935.
[3]T. Postranskyy, M. Afonin, M. Boikiv, and R. Bura, “Identifying patterns of change in traffic flows’ parameters depending on the organization of public transport movement,” East.-Eur. J. Enterp. Technol., vol. 5, pp. 72–81, 2024. doi: 10.15587/1729-4061.2024.313636.
[4]Y. Liu, D. He, J. Lei, M. He, and Z. Shi, “Investigating the non-linear influence of the built environment on passengers’ travel distance within metro and bus networks using smart card data,” Multimodal Transportation, vol. 4, no. 1, p. 100188, 2025. doi: 10.1016/j.multra.2025.100188.
[5]M. Boikiv, T. Postranskyy, and M. Afonin, “Establishing patterns of change in the efficiency of regulated intersection operation considering the permitted movement directions,” East.-Eur. J. Enterp. Technol., vol. 4, pp. 17–26, 2022. doi: 10.15587/1729-4061.2022.262250.
[6]D.D. Chuwang, W. Chen, and M. Zhong, “Short-term urban rail transit passenger flow forecasting based on fusion model methods using univariate time series,” Applied Soft Computing, vol. 147, p. 110740, 2023. doi: 10.1016/j.asoc.2023.110740.
[7]Y. Matseliukh, V. Vysotska, M. Bublyk, T. Kopach and O. Korolenko, “Network modelling of resource consumption intensities in human capital management in digital business enterprises by the critical path method,” in Proceedings of the CEUR Workshop Proceedings, Slavsko, Ukraine, vol. 2851, pp. 366–380, 2021. Available online: https://ceur-ws.org/Vol-2851/paper34.pdf.
[8]H. Cui, Z. Ren, M. Zhu, L. Liu, and J. Gao, “Non-linear impact of built environment on origin–destination passenger flow in Tianjin’s urban rail transit,” Proceedings of the Institution of Civil Engineers - Transport, 2025. doi: 10.1680/jtran.24.00166.
[9]Y. Matseliukh, M. Bublyk, and V. Vysotska, “Development of intelligent system for visual passenger flows simulation of public transport in smart city based on neural network,” in Proceedings of the CEUR Workshop Proceedings, Lviv, Ukraine, vol. 2870, pp. 1087–1138, 2021. Available online: https://ceur-ws.org/Vol-2870/paper82.pdf.
[10]Y. Matseliukh, V. Lytvyn and M. Bublyk, “K-means clustering method in organizing passenger transportation in a smart city,” in Proceedings of the CEUR Workshop Proceedings, Kharkiv, Ukraine, vol. 3983, pp. 219–240, 2025. Available online: https://ceur-ws.org/Vol-3983/paper17.pdf.
[11]M.A. Fadhel, A.M. Duhaim, A. Saihood, A. Sewify, M.N. Al-Hamadani, A. Albahri, L. Alzubaidi, A. Gupta, S. Mirjalili, and Y. Gu, “Comprehensive systematic review of information fusion methods in smart cities and urban environments,” Information Fusion, vol. 107, p. 102317, 2024. doi: 10.1016/j.inffus.2024.102317.
[12]H. Almukhalfi, A. Noor, and T.H. Noor, “Traffic management approaches using machine learning and deep learning techniques: A survey,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108147, 2024. doi: 10.1016/j.engappai.2024.108147.
[13]M. Barbareschi, A. Emmanuele, N. Mazzocca, and F. Rocco di Torrepadula, “Designing on-board explainable passenger flow prediction,” Engineering Applications of Artificial Intelligence, vol. 139, p. 109648, 2024. doi: 10.1016/j.engappai.2024.109648.
[14]S. M. H. Bamakan, F. Dehghan, and S. Akbarpour, “Role of machine learning for predictive modeling for Industry 5.0 and Society 5.0,” in Human-Centric Integration of Next-Generation Data Science and Blockchain Technology. Elsevier, 2025, pp. 265–285. https://doi.org/10.1016/b978-0-443-33498-6.00008-x.
[15]A. Scarano, M. Sadeghi, F. Mauriello, M.R. Riccardi, K. Aghabayk, and A. Montella, “Cyclist crash severity modeling: A hybrid approach of XGBoost-SHAP and random parameters logit with heterogeneity in means and variances,” Journal of Safety Research, vol. 93, pp. 373-398, 2025. doi: 10.1016/j.jsr.2025.04.003.
[16]A. Rodriguez, J. Arjona, and M. Linares, “Data-driven Strategies to Enhance Real Bus Passenger Occupancy Services using Weather and Flight Information,” Transportation Research Procedia, vol. 86, pp. 684-691, 2024. doi: 10.1016/j.trpro.2025.04.085.
[17]S. Vafaei and M. Yaghini, “Online prediction of arrival and departure times in each station for passenger trains using machine learning methods,” Transportation Engineering, vol. 16, p. 100250, 2024. doi: 10.1016/j.treng.2024.100250.
[18]H. Hao, Y. Wang, Du, L. and S. Chen, “Enabling smart curb management with spatiotemporal deep learning”, Computers, Environment and Urban Systems, 99, p. 101914, 2022. https://doi.org/10.1016/j.compenvurbsys.2022.101914
[19]M. Patel, S.B. Patel, D. Swain, and S. Shah, “Unleashing the Potential of Boosting Techniques to Optimize Station-Pairs Passenger Flow Forecasting,” Procedia Computer Science, vol. 235, pp. 32-44, 2023. doi: 10.1016/j.procs.2024.04.004.
[20]N. Chukhray, N. Shakhovska, M. Mrykhina, M. Bublyk, and L. Lisovska, “Consumer aspects in assessing the suitability of technologies for the transfer,” in Proceedings of the 14th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, pp. 142–147, 2019. doi: 10.1109/STC-CSIT.2019.8929879.
[21]W. Liu, S. Pang, Li, W. and Y. Han, “Assessing the CO2 emission reduction potential of metro-bus combined travel through interpretable machine learning,” Transportmetrica A: Transport Science, p. 1–24, 2025. doi: 10.1080/23249935.2025.2472869
[22]K. Wang, J. De Vos, M. Smart, and S. Wang, “Explaining Youth Driver Licensing Determinants Using XGBoost and SHAP,” Transport Policy, vol. 168, pp. 87-100, 2025. doi: 10.1016/j.tranpol.2025.04.009.
[23]F. Wu, C. Zheng, S. Zhou, Y. Lu, Z. Wu, and S. Zheng, “An interpretable approach to passenger flow prediction and irregular passenger travel patterns understanding in metro system,” Expert Systems With Applications, vol. 265, p. 125991, 2025. doi: 10.1016/j.eswa.2024.125991.
[24]M.A. Rad, L.M. Lefsrud, M.T. Hendry, A.C. Cen, and S. Soltaninejad, “Analysis of freight train passing a stop signal using machine learning: Application of XGBoost and SHAP,” Journal of Rail Transport Planning & Management, vol. 35, p. 100532, 2025. doi: 10.1016/j.jrtpm.2025.100532.
[25]Y. Wu, G. Mei, and K. Shao, “Revealing influence of meteorological conditions and flight factors on delays Using XGBoost,” Journal of Computational Mathematics and Data Science, vol. 3, p. 100030, 2022. doi: 10.1016/j.jcmds.2022.100030.
[26]T. Tang, M. Jia, Y. Zhang, H. Hu, M. Pei, Y. Chen, and X. Wang, “Why metro passengers change travel behavior: Individual-level insights from interpretable machine learning,” Cities, vol. 167, p. 106352, 2025. doi: 10.1016/j.cities.2025.106352.
[27]L. Cheng, X. Cai, D. Lei, S. He, and M. Yang, “Arrival information-guided spatiotemporal prediction of transportation hub passenger distribution,” Transportation Research Part E: Logistics and Transportation Review, vol. 195, p. 104011, 2025. doi: 10.1016/j.tre.2025.104011.
[28]C. Banyong, N. Hantanong, P. Wisutwattanasak, T. Champahom, K. Theerathitichaipa, M. Seefong, V. Ratanavaraha, and S. Jomnonkwao, “A machine learning comparison of transportation mode changes from high-speed railway promotion in Thailand,” Results in Engineering, vol. 24, p. 103110, 2024. doi: 10.1016/j.rineng.2024.103110.
[29]Z. Cheng, D. Sun, Y. Zhao, and H. Peng, “Investigating the factors influencing intercity travel mode choice in urban agglomerations: Insights from a three-phase framework,” Transportation Research Part A: Policy and Practice, vol. 199, p. 104577, 2025. doi: 10.1016/j.tra.2025.104577.
[30]X. Li, L. Shi, Y. Shi, J. Tang, P. Zhao, Y. Wang, and J. Chen, “Exploring interactive and nonlinear effects of key factors on intercity travel mode choice using XGBoost,” Applied Geography, vol. 166, p. 103264, 2024. doi: 10.1016/j.apgeog.2024.103264.
[31]J. Jiao, R. An, D. Du, and M. Zhu, “Non-linear and heterogeneous relationship between proximity to high-speed rail stations and land value in China: Analysis using XGBoost-SHAP modelling,” Transportation Research Part A: Policy and Practice, vol. 196, p. 104486, 2025. doi: 10.1016/j.tra.2025.104486.
[32]C. Peng, S. Yang, P. Zhang, and S. Hu, “Exploring nonlinear and interaction effects of TOD on housing rents using XGBoost,” Cities, vol. 158, p. 105728, 2025. doi: 10.1016/j.cities.2025.105728.
[33]Z. Kowalczuk, J. Wszołek, and J. Okuniewska, “Real-time predictive modeling of flight delays using distributed systems and machine learning,” IFAC PapersOnLine, vol. 59, no. 3, pp. 198–203, 2025. doi: 10.1016/j.ifacol.2025.07.034.
[34]F. Chen, Y. Zhu, C. Cao, X. Yang, X. Ji, M. Lai, W. Qiu, C.P. Nielsen, J. Wu, and X. Chen, “Examining nonlinear causal relationship between the built environment and VKT using RF–XGBoost,” Transport Policy, vol. 171, pp. 661-681, 2025. doi: 10.1016/j.tranpol.2025.07.012.
[35]A. Katrenko, I. Krislata, O. Veres, O. Oborska, T. Basyuk, A. Vasyliuk, I. Rishnyak, N. Demyanovskyi, and O. Meh, “Development of traffic flows and smart parking system for smart city,” in Proceedings of the CEUR Workshop Proceedings, Lviv, Ukraine, vol. 2604, pp. 730–745, 2020. Available online: https://ceur-ws.org/Vol-2604/paper50.pdf.
[36]S.G. Nnabuife, C. Udemu, A.K. Hamzat, C.K. Darko, and K.A. Quainoo, “Smart monitoring and control systems for hydrogen fuel cells using AI,” International Journal of Hydrogen Energy, vol. 110, pp. 704-726, 2024. doi: 10.1016/j.ijhydene.2025.02.232.
[37]Y. Fornalchyk, I. Kernytskyy, O. Hrytsun, and Y. Royko, “Choice of the rational regimes of traffic light control for traffic and pedestrian flows,” Sci. Rev. Eng. Environ. Sci., vol. 30, pp. 38–50, 2021. doi: 10.22630/PNIKS.2021.30.1.4.
[38]C.M. Caminiti, D. Fratelli, M. Spiller, A. Dimovski, and M. Merlo, “Integrating bottom-up GIS and machine learning models for spatial-temporal analysis of electric mobility impact on power system,” Smart Energy, vol. 19, p. 100185, 2025. doi: 10.1016/j.segy.2025.100185.
[39]H. Hao, Y. Wang, L. Du, and S. Chen, “Enabling smart curb management with spatiotemporal deep learning,” Computers, Environment and Urban Systems, vol. 99, p. 101914, 2022. doi: 10.1016/j.compenvurbsys.2022.101914.
[40]A. Huzzat, A. Anpalagan, A.S. Khwaja, I. Woungang, A.A. Alnoman, and A.S. Pillai, “A comprehensive review of Digital Twin technologies in smart cities,” Digital Engineering, vol. 4, p. 100040, 2025. doi: 10.1016/j.dte.2025.100040.
[41]M. Sarhani, A. Nourmohammadzadeh, S. Voß, and M. EL Amrani, “Predicting and analyzing ferry transit delays using open data and machine learning,” Journal of Public Transportation, vol. 27, p. 100124, 2025. doi: 10.1016/j.jpubtr.2025.100124.
[42]K. Y. Tiong, Z. Ma, and C.-W. Palmqvist, “Real-time High-Speed Train Delay Prediction using Seemingly Unrelated Regression Models,” Transportation Research Procedia, vol. 82, pp. 271-278, 2025. doi: 10.1016/j.trpro.2024.12.042.
[43]R. Viri and M. Örmä, “Using GTFS-data to calculate the roadwork caused delays on public transport network,” Transportation Research Procedia, vol. 82, pp. 1965–1973, 2025. doi: 10.1016/j.trpro.2024.12.166.
[44]A. Jain, I.H. Gue, and P. Jain, “Research trends, themes, and insights on artificial neural networks for smart cities towards SDG-11,” Journal of Cleaner Production, vol. 412, p. 137300, 2023. doi: 10.1016/j.jclepro.2023.137300.
[45]A.A. Kutty, T.G. Wakjira, M. Kucukvar, G.M. Abdella, and N.C. Onat, “Urban resilience and livability performance of European smart cities: A novel machine learning approach,” Journal of Cleaner Production, vol. 378, p. 134203, 2022. doi: 10.1016/j.jclepro.2022.134203.
[46]A. Chio, D. Jiang, P. Gupta, G. Bouloukakis, R. Yus, S. Mehrotra, and N. Venkatasubramanian, “SmartSPEC: A framework to generate customizable, semantics-based smart space datasets,” Pervasive and Mobile Computing, vol. 93, p. 101809, 2023.
[47]W. Liu, S. Pang, W. Li, and Y. Han, “Assessing the CO2 emission reduction potential of metro-bus combined travel through interpretable machine learning,” Transportmetrica A: Transport Science, pp. 1–24, 2025. doi: 10.1080/23249935.2025.2472869.
[48]Y. Ma, Q. Liu, and L. Yang, “Machine learning-based multimodal fusion recognition of passenger ship seafarers’ workload: A case study of a real navigation experiment,” Ocean Engineering, vol. 300, p. 117346, 2024. doi: 10.1016/j.oceaneng.2024.117346.
[49]T. Tang, J. Zhang, S. Chen, P. Mo, M. Pei, and T. Tang, “Deciphering the pulse of the city: An exploration of the natural features of metro passenger flow using XAI,” Computers & Industrial Engineering, vol. 204, p. 111097, 2025. doi: 10.1016/j.cie.2025.111097.
[50]K. Wang, B. Guo, H. Yang, M. Li, F. Zhang, and P. Wang, “A semi-supervised co-training model for predicting passenger flow change in expanding subways,” Expert Systems With Applications, vol. 209, p. 118310, 2022. doi: 10.1016/j.eswa.2022.118310.
[51]M. Bublyk, V. Lytvyn, V. Vysotska, L. Chyrun, Y. Matseliukh and N. Sokulska, “The decision tree usage for the results analysis of the psychophysiological testing,” in Proceedings of the CEUR Workshop Proceedings, Växjö, Sweden, vol. 2753, pp. 458–472, 2020. Available online: https://ceur-ws.org/Vol-2753/paper31.pdf.
[52]Y. Fornalchyk, E. Koda, I. Kernytskyy, O. Hrytsun, Y. Royko, R. Bura, P. Osiński, R. Barabash, R. Humenuyk, and P. Polyansky, “The impact of vehicle traffic volume on pedestrian behavior at unsignalized crosswalks,” Roads and Bridges – Drogi i Mosty, vol. 22, pp. 201–219, 2023. doi: 10.7409/rabdim.023.010.