Data-driven Classification of Tsunami Evacuation Suitability Using XGBoost: A Case Study in Padang City

PDF (779KB), PP.90-106

Views: 0 Downloads: 0

Author(s)

Sularno Sularno 1,* Wendi Boy 2 Putri Anggraini 1 Ahmad Kamal 3 Fei Wang 4

1. Department of Information System, Dharma Andalas University, Padang, Indonesia

2. Department of Civil Engineering, Dharma Andalas University, Padang, Indonesia

3. Department of Information System, Pelita Indonesia Institute of Business and Technology, Pekanbaru, Indonesia

4. Department of Population Health Sciences Weill Cornell Medicine, New York City, United States

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.01.07

Received: 14 Jun. 2025 / Revised: 13 Sep. 2025 / Accepted: 25 Oct. 2025 / Published: 8 Feb. 2026

Index Terms

Tsunami Evacuation, XGBoost, Machine Learning, Disaster Prediction, Early Warning System

Abstract

In this research, we established a machine learning–based model to predict the suitability of tsunami evacuation locations in Padang City through the Extreme Gradient Boosting (XGBoost) method. We trained the model on a new synthetic dataset with 5,000 observations with key geospatial and demographic features such as elevation, distance to coastline, suggested evacuation capacity, surrounding population count and site area. The analysis process consisted of preprocessing, feature selection utilizing the XGBoost Classifier, training and cross-validation on each model, and evaluation through regression as well as classification metrics. The XGBoost model performed best (RMSE=0.0642, MAE=0.0418 and Accuracy=93.8%), which was even better than Random Forest, Gradient Boosting Trees and Logistic Regression models. These findings demonstrate that XGBoost can successfully extract complicated spatial–demographic associations with little overfitting. The residual analysis and the actual-vs-predicted plots also reveal good model calibration and stability. A web prototype was also created to visualize the suitability of evacuation and facilitate spatial decision making. Although the model is based on simulated data, it offers an extendible and interpretable framework to be integrated in practical scenarios with field and operational disaster management systems. To the best of our knowledge, this work represents the first use of XGBoost algorithm in Indonesia to classify tsunami evacuation sites and functions as a new tool for disaster preparedness and evacuation plans on the coast.

Cite This Paper

Sularno Sularno, Wendi Boy, Putri Anggraini, Ahmad Kamal, Fei Wang, "Data-driven Classification of Tsunami Evacuation Suitability Using XGBoost: A Case Study in Padang City", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.1, pp.90-106, 2026. DOI:10.5815/ijisa.2026.01.07

Reference

[1]Xu K, Han Z, Xu H, Bin L. Rapid Prediction Model for Urban Floods Based on a Light Gradient Boosting Machine Approach and Hydrological–Hydraulic Model. International Journal of Disaster Risk Science 2023. https://doi.org/10.1007/s13753-023-00465-2.
[2]BNPB, “Tsunami Hazard Map for West Sumatera,” Badan Nasional Penanggulangan Bencana, 2021. n.d.
[3]Zhu Z, Zhang Y. Flood disaster risk assessment based on random forest algorithm. Neural Comput Appl 2022;34:3443–55. https://doi.org/10.1007/s00521-021-05757-6.
[4]Razali N, Ismail S, Mustapha A. Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence (IJ-AI) 2020;9:73. https://doi.org/10.11591/ijai.v9.i1.pp73-80.
[5]Joshi A, Vishnu C, Mohan CK, Raman B. Application of XGBoost model for early prediction of earthquake magnitude from waveform data. Journal of Earth System Science 2023;133:5. https://doi.org/10.1007/s12040-023-02210-1.
[6]Ma M, Zhao G, He B, Li Q, Dong H, Wang S, et al. XGBoost-based method for flash flood risk assessment. J Hydrol (Amst) 2021;598:126382. https://doi.org/10.1016/j.jhydrol.2021.126382.
[7]Le X-H, Thu Hien LT. Predicting maximum scour depth at sluice outlet: a comparative study of machine learning models and empirical equations. Environ Res Commun 2024;6:015010. https://doi.org/10.1088/2515-7620/ad1f94.
[8]Ren H, Pang B, Bai P, Zhao G, Liu S, Liu Y, et al. Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost). Remote Sens (Basel) 2024;16:320. https://doi.org/10.3390/rs16020320.
[9]El-Magd SAA, Pradhan B, Alamri A. Machine learning algorithm for flash flood prediction mapping in Wadi El-Laqeita and surroundings, Central Eastern Desert, Egypt. Arabian Journal of Geosciences 2021;14:323. https://doi.org/10.1007/s12517-021-06466-z.
[10]Ibrahem Ahmed Osman A, Najah Ahmed A, Chow MF, Feng Huang Y, El-Shafie A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal 2021;12:1545–56. https://doi.org/10.1016/j.asej.2020.11.011.
[11]Ahmadi SM, Balahang S, Abolfathi S. Predicting the hydraulic response of critical transport infrastructures during extreme flood events. Eng Appl Artif Intell 2024;133:108573. https://doi.org/10.1016/j.engappai.2024.108573.
[12]Nti IK, Nyarko-Boateng O, Boateng S, Bawah FU, Agbedanu PR, Awarayi NS, et al. Enhancing Flood Prediction using Ensemble and Deep Learning Techniques. 2021 22nd International Arab Conference on Information Technology (ACIT), IEEE; 2021, p. 1–9. https://doi.org/10.1109/ACIT53391.2021.9677084.
[13]Kumar V, Kedam N, Sharma KV, Khedher KM, Alluqmani AE. A Comparison of Machine Learning Models for Predicting Rainfall in Urban Metropolitan Cities. Sustainability 2023;15:13724. https://doi.org/10.3390/su151813724.
[14]Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences 2022;34:1060–73. https://doi.org/10.1016/j.jksuci.2019.06.012.
[15]Gao M, Xu L, Huang W. Optimal Resource Allocation for D2D Multicast Communications for XL-MIMO Systems. IEEE Access 2024;12:161519–29. https://doi.org/10.1109/ACCESS.2024.3483297.
[16]Wang H, Liang Q, Hancock JT, Khoshgoftaar TM. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. J Big Data 2024;11:44. https://doi.org/10.1186/s40537-024-00905-w.
[17]Lee G, Lee K. Feature selection using distributions of orthogonal PLS regression vectors in spectral data. BioData Min 2021;14:7. https://doi.org/10.1186/s13040-021-00240-3.
[18]Syeed MMA, Farzana M, Namir I, Ishrar I, Nushra MH, Rahman T. Flood Prediction Using Machine Learning Models. 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), IEEE; 2022, p. 1–6. https://doi.org/10.1109/HORA55278.2022.9800023.
[19]Tan X, Cui W. Production scheduling problem under peak power constraint. 2020 IEEE Sustainable Power and Energy Conference (iSPEC), IEEE; 2020, p. 2083–8. https://doi.org/10.1109/iSPEC50848.2020.9351234.
[20]Liu H, Tang D. Ecological zoning and ecosystem management based on landscape ecological risk and ecosystem services: A case study in the Wuling Mountain Area. Ecol Indic 2024;166:112421. https://doi.org/10.1016/j.ecolind.2024.112421.
[21]Yuan H, Wang M, Zhang D, Muhammad Adnan Ikram R, Su J, Zhou S, et al. Data-driven urban configuration optimization: An XGBoost-based approach for mitigating flood susceptibility and enhancing economic contribution. Ecol Indic 2024;166:112247. https://doi.org/10.1016/j.ecolind.2024.112247.
[22]Koshiba Y, Nakayama J. Intentions of university students and staff members to re-enter chemical storage buildings immediately after a major earthquake: A case study in Japan. International Journal of Disaster Risk Reduction 2021;57:102150. https://doi.org/10.1016/j.ijdrr.2021.102150.
[23]León J, Gubler A, Ogueda A. Modelling geographical and built-environment attributes as predictors of human vulnerability during tsunami evacuations: a multi-case-study and paths to improvement. Natural Hazards and Earth System Sciences 2022;22:2857–78. https://doi.org/10.5194/nhess-22-2857-2022.
[24]Lv B, Gong H, Dong B, Wang Z, Guo H, Wang J, et al. An Explainable XGBoost Model for International Roughness Index Prediction and Key Factor Identification. Applied Sciences 2025;15:1893. https://doi.org/10.3390/app15041893.
[25]Ibrahem Ahmed Osman A, Najah Ahmed A, Chow MF, Feng Huang Y, El-Shafie A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal 2021;12:1545–56. https://doi.org/10.1016/j.asej.2020.11.011.
[26]    Nguyen DH, Hien Le X, Heo J-Y, Bae D-H. Development of an Extreme Gradient Boosting Model Integrated With Evolutionary Algorithms for Hourly Water Level Prediction. IEEE Access 2021;9:125853–67. https://doi.org/10.1109/ACCESS.2021.3111287.
[27]Sarzyński J, Paja S, Paja W. Identification of outlier cases in information systems using selection of relevant intervals of attribute values. Procedia Comput Sci 2024;246:4610–6. https://doi.org/10.1016/j.procs.2024.09.325.
[28]Lin Z, Fan Y, Tan J, Li Z, Yang P, Wang H, et al. Tool wear prediction based on XGBoost feature selection combined with PSO-BP network. Sci Rep 2025;15:3096. https://doi.org/10.1038/s41598-025-85694-9.
[29]Joshi A, Vishnu C, Mohan CK, Raman B. Application of XGBoost model for early prediction of earthquake magnitude from waveform data. Journal of Earth System Science 2023;133:5. https://doi.org/10.1007/s12040-023-02210-1.
[30]Ren H, Pang B, Bai P, Zhao G, Liu S, Liu Y, et al. Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost). Remote Sens (Basel) 2024;16:320. https://doi.org/10.3390/rs16020320.
[31]Ahmadi SM, Balahang S, Abolfathi S. Predicting the hydraulic response of critical transport infrastructures during extreme flood events. Eng Appl Artif Intell 2024;133:108573. https://doi.org/10.1016/j.engappai.2024.108573.
[32]Fargnoli M, Haber N. A QFD-based approach for the development of smart product-service systems. Engineering Reports 2023;5. https://doi.org/10.1002/eng2.12665.
[33]Teodorescu V, Obreja Brașoveanu L. Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Computation 2025;13:127. https://doi.org/10.3390/computation13050127.
[34]Wilimitis D, Walsh CG. Practical Considerations and Applied Examples of Cross-Validation for Model Development and Evaluation in Health Care: Tutorial. JMIR AI 2023;2:e49023. https://doi.org/10.2196/49023.
[35]Zhang J, Fu R, Cui Y, Chen S, Li M, Zhang X. A new method for generating equidistant/equiratio line between two points on the earth ellipsoid. Comput Geosci 2024;188:105598. https://doi.org/10.1016/j.cageo.2024.105598.
[36]Kumar C, Walton G, Santi P, Luza C. Random Cross-Validation Produces Biased Assessment of Machine Learning Performance in Regional Landslide Susceptibility Prediction. Remote Sens (Basel) 2025;17:213. https://doi.org/10.3390/rs17020213.
[37]Sluijterman L, Cator E, Heskes T. Confident Neural Network Regression with Bootstrapped Deep Ensembles 2023.
[38]Tyralis H, Papacharalampous G. A review of predictive uncertainty estimation with machine learning. Artif Intell Rev 2024;57:94. https://doi.org/10.1007/s10462-023-10698-8.
[39]Singh G, Moncrieff G, Venter Z, Cawse-Nicholson K, Slingsby J, Robinson TB. Uncertainty quantification for probabilistic machine learning in earth observation using conformal prediction. Sci Rep 2024;14:16166. https://doi.org/10.1038/s41598-024-65954-w.
[40]Yuan Z, Jiang W. Confidence Intervals for Evaluation of Data Mining 2025.
[41]Ren H, Pang B, Bai P, Zhao G, Liu S, Liu Y, et al. Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost). Remote Sens (Basel) 2024;16:320. https://doi.org/10.3390/rs16020320.
[42]Yuan H, Wang M, Zhang D, Muhammad Adnan Ikram R, Su J, Zhou S, et al. Data-driven urban configuration optimization: An XGBoost-based approach for mitigating flood susceptibility and enhancing economic contribution. Ecol Indic 2024;166:112247. https://doi.org/10.1016/j.ecolind.2024.112247.
[43]Syeed MMA, Farzana M, Namir I, Ishrar I, Nushra MH, Rahman T. Flood Prediction Using Machine Learning Models. 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), IEEE; 2022, p. 1–6. https://doi.org/10.1109/HORA55278.2022.9800023.
[44]Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences 2022;34:1060–73. https://doi.org/10.1016/j.jksuci.2019.06.012.
[45]Lee G, Lee K. Feature selection using distributions of orthogonal PLS regression vectors in spectral data. BioData Min 2021;14:7. https://doi.org/10.1186/s13040-021-00240-3.
[46]Branson N, Cutillas PR, Bessant C. Comparison of multiple modalities for drug response prediction with learning curves using neural networks and XGBoost. Bioinformatics Advances 2024;4. https://doi.org/10.1093/bioadv/vbad190.
[47]Joshi A, Vishnu C, Mohan CK, Raman B. Application of XGBoost model for early prediction of earthquake magnitude from waveform data. Journal of Earth System Science 2023;133:5. https://doi.org/10.1007/s12040-023-02210-1.
[48]Ibrahem Ahmed Osman A, Najah Ahmed A, Chow MF, Feng Huang Y, El-Shafie A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal 2021;12:1545–56. https://doi.org/10.1016/j.asej.2020.11.011.
[49]Kumar V, Kedam N, Sharma KV, Khedher KM, Alluqmani AE. A Comparison of Machine Learning Models for Predicting Rainfall in Urban Metropolitan Cities. Sustainability 2023;15:13724. https://doi.org/10.3390/su151813724.
[50]Alkhalidi A, Assaf MN, Alkaylani H, Halaweh G, Salcedo FP. Integrated innovative technique to assess and priorities risks associated with drought: Impacts, measures/strategies, and actions, global study. International Journal of Disaster Risk Reduction 2023;94:103800. https://doi.org/10.1016/j.ijdrr.2023.103800.
[51]Jitt-Aer K, Wall G, Jones D, Teeuw R. Use of GIS and dasymetric mapping for estimating tsunami-affected population to facilitate humanitarian relief logistics: a case study from Phuket, Thailand. Natural Hazards 2022;113:185–211. https://doi.org/10.1007/s11069-022-05295-x.
[52]Jumadi J, Priyono KD, Amin C, Saputra A, Gomez C, Lam K-C, et al. Tsunami Risk Mapping and Sustainable Mitigation Strategies for Megathrust Earthquake Scenario in Pacitan Coastal Areas, Indonesia. Sustainability 2025;17:2564. https://doi.org/10.3390/su17062564.