Improving Agricultural Commodity Trading through Data Imputation Methods for Price Prediction Accuracy

PDF (1209KB), PP.24-36

Views: 0 Downloads: 0

Author(s)

Pattharaporn Thongnim 1,* Sueppong Mueanchamnong 1

1. Department of Mathematics, Faculty of Science, Burapha University, Chonburi, 20131, Thailand

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2026.02.02

Received: 4 Nov. 2025 / Revised: 29 Dec. 2025 / Accepted: 2 Feb. 2026 / Published: 8 Apr. 2026

Index Terms

Missing Data, Imputation, Machine Learning, Internet of Things, Kalman Filter, Prophet Imputation, Moving Average

Abstract

Agricultural price prediction in developing regions faces significant challenges from missing data in Internet of Things (IoT)-based environmental monitoring systems, particularly in tropical fruit cultivation where sensors frequently experience connectivity and operational failures. This study evaluates the impact of missing data imputation methods on agricultural price prediction model performance using environmental and market data from a commercial durian orchard in Chanthaburi Province, Thailand (2023-2024). Three imputation strategies—Linear Interpolation, Prophet, and Kalman Filter—were systematically compared across four machine learning algorithms (Regression Trees, Random Forest, XGBoost, and Artificial Neural Networks) using 10-fold cross-validation. The dataset comprised 182 observations with 28.02% missing environmental data and 68.13% missing price data, representing realistic constraints in developing agricultural economies. Results demonstrated that XGBoost consistently achieved superior performance across all imputation methods, with Kalman Filter combined with XGBoost showing the best testing performance (R² = 0.9767, MSE = 0.0013, MAE = 0.0287, MAPE = 1.49%). However, these results require careful interpretation given the limited sample size, high missingness, and potential temporal data leakage from random train-test splitting. Time series visualization revealed distinct characteristics: Linear Interpolation provided computational efficiency but oversimplified data complexity, Prophet captured seasonal patterns but introduced excessive noise, while Kalman Filter offered balanced performance preserving both smoothness and natural variability. Practical price prediction analysis showed substantial variations up to 35 Thai Baht per kilogram between imputation methods. The findings provide methodological evidence for imputation strategy selection in agricultural IoT systems with missing data, though validation with larger multi-site datasets is essential before operational deployment.

Cite This Paper

Pattharaporn Thongnim, Sueppong Mueanchamnong, "Improving Agricultural Commodity Trading through Data Imputation Methods for Price Prediction Accuracy", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.18, No.2, pp. 24-36, 2026. DOI:10.5815/ijieeb.2026.02.02

Reference

[1]M. Dhanaraju, P. Chenniappan, K. Ramalingam, S. Pazhanivelan, and R. Kaliaperumal, “Smart farming: Internet of Things (IoT)-based sustainable agriculture,” Agriculture, vol. 12, no. 10, p. 1745, 2022.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[2]X. Zou, W. Liu, Z. Hu, S. Wang, Z. Chen, C. Xin, Y. Bai, Z. Liang, Y. Gong, Y. Qian, et al., “Current status and prospects of research on sensor fault diagnosis of agricultural internet of things,” Sensors, vol. 23, no. 5, p. 2528, 2023.
[3]Y. Zhang and P. J. Thorburn, “Handling missing data in near real-time environmental monitoring: A system and a review of selected methods,” Future Generation Computer Systems, vol. 128, pp. 63–72, 2022.
[4]J. Du, M. Hu, and W. Zhang, “Missing data problem in the monitoring system: A review,” IEEE Sensors Journal, vol. 20, no. 23, pp. 13984–13998, 2020.
[5]D. Adhikari, W. Jiang, J. Zhan, Z. He, D. B. Rawat, U. Aickelin, and H. A. Khorshidi, “A comprehensive survey on imputation of missing data in internet of things,” ACM Computing Surveys, vol. 55, no. 7, pp. 1–38, 2022.
[6]Z. Ding, G. Mei, S. Cuomo, Y. Li, and N. Xu, “Comparison of estimating missing values in IoT time series data using different interpolation algorithms,” International Journal of Parallel Programming, vol. 48, no. 3, pp. 534–548, 2020.
[7]F. A. N. Declerq, “Interpolation methods for scattered sample data: Accuracy, spatial patterns, processing time,” Cartography and Geographic Information Systems, vol. 23, no. 3, pp. 128–144, 1996.
[8]M. Weber, M. Turowski, H. K. Çakmak, R. Mikut, U. Kühnapfel, and V. Hagenmeyer, “Data-driven copy-paste imputation for energy time series,” IEEE Transactions on Smart Grid, vol. 12, no. 6, pp. 5409–5419, 2021.
[9]G. Candiani, C. Carnevale, G. Finzi, E. Pisoni, and M. Volta, “A comparison of reanalysis techniques: Applying optimal interpolation and ensemble Kalman filtering to improve air quality monitoring at mesoscale,” Science of the Total Environment, vol. 458, pp. 7–14, 2013.
[10]A. Ajiono and T. Hariguna, “Comparison of three time series forecasting methods on linear regression, exponential smoothing, and weighted moving average,” International Journal of Informatics and Information Systems, vol. 6, no. 2, pp. 89–102, 2023.
[11]A. R. Mohammed, K. S. Hassan, and M. A. M. Abdel-Aal, “Moving average smoothing for Gregory–Newton interpolation: A novel approach for short-term demand forecasting,” IFAC-PapersOnLine, vol. 55, no. 10, pp. 749–754, 2022.
[12]L. Huang, Y. Liu, W. Huang, Y. Dong, H. Ma, K. Wu, and A. Guo, “Combining random forest and XGBoost methods in detecting early and mid-term winter wheat stripe rust using canopy-level hyperspectral measurements,” Agriculture, vol. 12, no. 1, p. 74, 2022.
[13]O. M’hamdi, S. Takács, G. Palotás, R. Ilahy, L. Helyes, and Z. Pék, “A comparative analysis of XGBoost and neural network models for predicting some tomato fruit quality traits from environmental and meteorological data,” Plants, vol. 13, no. 5, p. 746, 2024.
[14]I. A. Basheer and M. Hajmeer, “Artificial neural networks: Fundamentals, computing, design, and application,” Journal of Microbiological Methods, vol. 43, no. 1, pp. 3–31, 2000.
[15]G. De’ath and K. E. Fabricius, “Classification and regression trees: A powerful yet simple technique for ecological data analysis,” Ecology, vol. 81, no. 11, pp. 3178–3192, 2000.
[16]N. M. Noor, M. M. A. B. Abdullah, A. S. Yahaya, and N. A. Ramli, “Comparison of linear interpolation method and mean method to replace the missing values in environmental data set,” in Proc. Materials Science Forum, vol. 803, pp. 278–281, 2015.
[17]L. Padilla, B. Lagos-Álvarez, J. Mateu, and E. Porcu, “Space-time autoregressive estimation and prediction with missing data based on Kalman filtering,” Environmetrics, vol. 31, no. 7, p. e2627, 2020.
[18]S. F. Stefenon, L. O. Seman, V. C. Mariani, and L. dos Santos Coelho, “Aggregating prophet and seasonal trend decomposition for time series forecasting of Italian electricity spot prices,” Energies, vol. 16, no. 3, p. 1371, 2023.
[19]S. A. Akrami, A. El-Shafie, M. Naseri, and C. A. G. Santos, “Rainfall data analyzing using moving average (MA) model and wavelet multi-resolution intelligent model for noise evaluation to improve the forecasting accuracy,” Neural Computing and Applications, vol. 25, no. 7, pp. 1853–1864, 2014.
[20]M. A. Jahin, M. S. H. Shovon, J. Shin, A. R. Ridoy, and M. F. Mridha, “Big data–supply chain management framework for forecasting: Data preprocessing and machine learning techniques,” arXiv preprint arXiv:2307.12971, 2023.
[21]J.-A. Ting, E. Theodorou, and S. Schaal, “A Kalman filter for robust outlier detection,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, pp. 1514–1519, 2007.
[22]P. Schratz, J. Muenchow, E. Iturritxa, J. Richter, and A. Brenning, “Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data,” Ecological Modelling, vol. 406, pp. 109–120, 2019.
[23]D. A. Newman, “Missing data: Five practical guidelines,” Organizational Research Methods, vol. 17, no. 4, pp. 372–411, 2014.