IJCNIS Vol. 17, No. 3, 8 Jun. 2025
Cover page and Table of Contents: PDF (size: 1519KB)
PDF (1519KB), PP.18-34
Views: 0 Downloads: 0
Big Data Time Analytics, Hierarchical Clustering, Time Sequence Predicting, Multi-scale Dynamic Time Warping, Long Short Term Memory, Forecasting Accuracy
Time series forecasting in big data analytics is crucial for making decisions in a variety of fields. but faces challenges due to high dimensionality, non-stationarity, and dynamic patterns. Conventional approaches frequently produce inaccurate results because they are unable to capture sudden variations and intricate temporal connections. This study proposes a Multi-scale Dynamic Time Warping-based Hierarchical Clustering (MDTWbH) approach to improve forecasting accuracy and scalability. Multi-scale Dynamic Time Warping (MDTW) transforms time series data into multi-scale representations, preserving local and global patterns, while Hierarchical Clustering groups similar sequences for enhanced predictive performance. The proposed framework integrates data preprocessing, outlier detection, and missing value interpolation to refine input data. It employs Apache Hadoop and Spark for efficient big data processing. Long Short Term Memory (LSTM) is applied within each cluster for accurate forecasting, and accuracy, precision, recall, F1-score, MAE, and RMSE are used to assess the performance of the model. Experimental results on electricity demand, wind speed, and taxi demand datasets demonstrate superior performance compared to existing techniques. MDTWbH provides a scalable and interpretable solution for large-scale time series forecasting by efficiently capturing evolving temporal patterns.
Gaurav Sharma, Kailash Chandra Bandhu, "Big Data Time Series Forecasting Using Pattern Sequencing Similarity", International Journal of Computer Network and Information Security(IJCNIS), Vol.17, No.3, pp.18-34, 2025. DOI:10.5815/ijcnis.2025.03.02
[1]A.L. Schaffer, T.A. Dobbins and S.A. Pearson, “Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions,” BMC medical research methodology, Vol. 21, pp.1-12, 2021.
[2]C. Hou, J. Wu, B. Cao and J. Fan, “A deep-learning prediction model for imbalanced time series data forecasting,” Big Data Mining and Analytics, Vol. 4, No. 4, pp. 266-278, 2021.
[3]Z. Fang, N. Crimier, L. Scanu, A. Midelet, A. Alyafi and B. Delinchant, “Multi-zone indoor temperature prediction with LSTM-based sequence to sequence model,” Energy and Buildings, Vol. 245, pp. 111-053, 2021.
[4]S. Reza, M.C. Ferreira, J.J. Machado and J.M.R. Tavares, “A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks,” Expert Systems with Applications, Vol. 202, pp. 117-275, 2022.
[5]J. Bi, X. Zhang, H. Yuan, J. Zhang and M. Zhou, A hybrid prediction method for realistic network traffic with temporal convolutional network and LSTM, IEEE Transactions on Automation Science and Engineering, Vol. 19, no. 3, pp.1869-1879, 2021.
[6]H.D. Nguyen, K.P. Tran, S. Thomassey and M. Hamad, “Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management,” International Journal of Information Management, Vol. 57, pp. 102-282, 2021.
[7]A.K. Dubey, A. Kumar, V. García-Díaz, A.K. Sharma and K. Kanhaiya, “Study and analysis of SARIMA and LSTM in forecasting time series data,” Sustainable Energy Technologies and Assessments, Vol. 47, pp. 101-474, 2021.
[8]Z. Chen, D. Chen, X. Zhang, Z. Yuan and X. Cheng, “Learning graph structures with transformer for multivariate time-series anomaly detection in IoT,” IEEE Internet of Things Journal, Vol. 9, No. 12, pp.9179-9189, 2021.
[9]M. Middlehurst, J. Large, M. Flynn, J. Lines, A. Bostrom and A. Bagnall, “HIVE-COTE 2.0: a new meta ensemble for time series classification,” Machine Learning, Vol. 110, No. 11, pp.3211-3243, 2021.
[10]P. Montero-Manso and R.J. Hyndman, “Principles and algorithms for forecasting groups of time series: Locality and globality,” International Journal of Forecasting, Vol. 37, No. 4, pp.1632-1653, 2021.
[11]S. Ahmed, I.E. Nielsen, A. Tripathi, S. Siddiqui, R.P. Ramachandran and G. Rasool, “Transformers in time-series analysis: A tutorial,” Circuits, Systems, and Signal Processing, Vol. 42, No. 12, pp.7433-7466, 2023.
[12]Y. Zhang, Y. Chen, J. Wang and Z. Pan, “Unsupervised deep anomaly detection for multi-sensor time-series signals,” IEEE Transactions on Knowledge and Data Engineering, Vol. 35, No. 2, pp.2118-2132, 2021.
[13]R.K. Jagait, M.N. Fekri, K. Grolinger and S. Mir, “Load forecasting under concept drift: Online ensemble learning with recurrent neural network and ARIMA,” IEEE Access, Vol. 9, pp.98992-99008, 2021.
[14]H. Abbasimehr and R. Paki, “Improving time series forecasting using LSTM and attention models,” Journal of Ambient Intelligence and Humanized Computing, Vol. 13, No. 1, pp.673-691, 2022.
[15]W. Yu, I.Y. Kim and C. Mechefske, “Analysis of different RNN autoencoder variants for time series classification and machine prognostics,” Mechanical Systems and Signal Processing, Vol. 149, pp. 107-322, 2021.
[16]H. Rezaei, H. Faaljou and G. Mansourfar, “Stock price prediction using deep learning and frequency decomposition,” Expert Systems with Applications, Vol. 169, pp. 114-332, 2021.
[17]H. He, Q. Zhang, K. Yi, K. Shi, Z. Niu and L. Cao, “Distributional Drift Adaptation with Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting. arXiv preprint arXiv:2209.00654. 2022.
[18]R. Chandra, S. Goyal and R. Gupta, “Evaluation of deep learning models for multi-step ahead time series prediction,” Ieee Access, Vol. 9, pp.83105-83123, 2021.
[19]T. Toharudin, R.S. Pontoh, R.E. Caraka, S. Zahroh, Y. Lee and R.C. Chen, “Employing long short-term memory and Facebook prophet model in air temperature forecasting,” Communications in Statistics-Simulation and Computation, Vol. 52, No. 2, pp.279-290, 2023.
[20]N. Ayoobi, D. Sharifrazi, R. Alizadehsani, A. Shoeibi, J.M. Gorriz, H. Moosaei, A. Khosravi, S. Nahavandi, A.G. Chofreh, F.A. Goni and J.J. Klemeš, “Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods,” Results in physics, Vol. 27, pp. 104-495, 2021.
[21]J. Simeunović, B. Schubnel, P.J. Alet and R.E. Carrillo, “Spatio-temporal graph neural networks for multi-site PV power forecasting,” IEEE Transactions on Sustainable Energy, Vol. 13, No. 2, pp.1210-1220, 2021.
[22]C. Wang, Y. Chen, S. Zhang and Q. Zhang, “Stock market index prediction using deep Transformer model,” Expert Systems with Applications, Vol. 208, pp. 118-128, 2022.
[23]A.R. Troncoso-García, M. Martínez-Ballesteros, F. Martínez-Álvarez and A. Troncoso, “A new approach based on association rules to add explainability to time series forecasting models,” Information Fusion, Vol. 94, pp.169-180, 2023.
[24]M. Li, Y. Zhu, Y. Shen and M. Angelova, Clustering-enhanced stock price prediction using deep learning, World Wide Web, Vol. 26, No. 1, pp.207-232, 2023.
[25]Z. Shao, F. Wang, Y. Xu, W. Wei, C. Yu, Z. Zhang, D. Yao, T. Sun, G. Jin, X. Cao and G. Cong, “Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis,” IEEE Transactions on Knowledge and Data Engineering, 2024.
[26]C. Zhao, P. Hu, X. Liu, X. Lan and H. Zhang, “Stock market analysis using time series relational models for stock price prediction,” Mathematics, Vol. 11, No. 5, pp. 11-30, 2023.
[27]H. Du, S. Du and W. Li, “Probabilistic time series forecasting with deep non‐linear state space models,” CAAI Transactions on Intelligence Technology, Vol. 8, No. 1, pp. 3-13, 2023.
[28]Y. Wei, J. Jang-Jaccard, W. Xu, F. Sabrina, S. Camtepe and M. Boulic, “LSTM-autoencoder-based anomaly detection for indoor air quality time-series data,” IEEE Sensors Journal, Vol. 23, No. 4, pp.3787-3800, 2023.
[29]Y. Wu, H.N. Dai and H. Tang, “Graph neural networks for anomaly detection in industrial internet of things,” IEEE Internet of Things Journal, Vol. 9, No. 12, pp. 9214-9231, 2021.
[30]X. Zou, S. Zhang, C. Zhang, J.Q. James and E. Chung, “Long-term origin-destination demand prediction with graph deep learning,” IEEE Transactions on Big Data, Vol. 8, No. 6, pp. 1481-1495, 2021.
[31]H.A. Bedel, I. Sivgin, O. Dalmaz, S.U. Dar and T. Çukur, “BolT: Fused window transformers for fMRI time series analysis,” Medical image analysis, Vol. 88, pp. 102-841, 2023.
[32]A.O. Aseeri, “Effective RNN-based forecasting methodology design for improving short-term power load forecasts: Application to large-scale power-grid time series,” Journal of Computational Science, Vol. 68, pp. 101-984, 2023.
[33]E.M. Onyema, U.K. Lilhore, P. Saurabh, S. Dalal, A.S. Nwaeze, A.T. Chijindu, L.C. Ndufeiya-Kumasi and S. Simaiya, “Evaluation of IoT-Enabled hybrid model for genome sequence analysis of patients in healthcare 4.0,” Measurement: Sensors, Vol. 26, pp. 100-679, 2023.
[34]X. Chu, H. Jin, Y. Li, J. Feng and W. Mu, “CDA-LSTM: An evolutionary convolution-based dual-attention LSTM for univariate time series prediction,” Neural Computing and Applications, Vol. 33, pp. 16113-16137, 2021.
[35]R. Rathipriya, A.A. Abdul Rahman, S. Dhamodharavadhani, A. Meero and G.J.N.C. Yoganandan, “Demand forecasting model for time-series pharmaceutical data using shallow and deep neural network model,” Neural Computing and Applications, Vol. 35, No. 2, pp.1945-1957, 2023.
[36]T. Ahmad and D. Zhang, “A data-driven deep sequence-to-sequence long-short memory method along with a gated recurrent neural network for wind power forecasting,” Energy, Vol. 239, pp. 122109, 2022.
[37]A. Meng, H. Zhang, H. Yin, Z. Xian, S. Chen, Z. Zhu, Z. Zhang, J. Rong, C. Li, C. Wang and Z. Wu, “A novel multi-gradient evolutionary deep learning approach for few-shot wind power prediction using time-series GAN,” Energy, Vol. 283, pp. 129-139, 2023.
[38]W. Zhang, Z. Lin and X. Liu, “Short-term offshore wind power forecasting-A hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM),” Renewable Energy, Vol. 185, pp. 611-628, 2022.
[39]A.Y. Barrera-Animas, L.O. Oyedele, M. Bilal, T.D. Akinosho, J.M.D. Delgado and L.A. Akanbi, “Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting,” Machine Learning with Applications, Vol. 7, pp. 100-204, 2022.
[40]M. Barsbey and T. Cemgil, “Modeling Hierarchical Seasonality through Low-Rank Tensor Decompositions in Time Series Analysis,” IEEE Access, 2023.
[41]N. Awan, A. Ali, F. Khan, M. Zakarya, R. Alturki, M. Kundi, M.D. Alshehri and M. Haleem, “Modeling dynamic spatio-temporal correlations for urban traffic flows prediction,” IEEE Access, Vol. 9, pp. 26502-26511, 2021.
[42]N. Sirisumpun, K. Wongwailikhit, P. Painmanakul and P. Vateekul, “Spatio-Temporal PM2. 5 Forecasting in Thailand Using Encoder-Decoder Networks,” IEEE Access, 2023.
[43]A. Arif, N. Javaid, A. Aldegheishem and N. Alrajeh, “Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities,” Concurrency and Computation: Practice and Experience, Vo l. 33, No. 17, pp. e6316, 2021.
[44]S. Lu and Y. Xia, “Dual supervised autoencoder based trajectory classification using enhanced spatio-temporal information,” IEEE Access. Vol. 8, pp. 173918-173932, 2020.
[45]L. Yu, B. Du, X. Hu, L. Sun, L. Han and W. Lv, “Deep spatio-temporal graph convolutional network for traffic accident prediction,” Neurocomputing, Vol. 423, pp. 135-147, 2021.
[46]Y. Zheng, Z. Gao, Y. Wang and Q. Fu, “MOOC dropout prediction using FWTS-CNN model based on fused feature weighting and time series,” IEEE Access, Vol.. 8, pp. 225324-225335, 2020.