A Comparative Study of Statistical (SARIMA) Vis-À-Vis Some Traditional Machine-Learning and Deep-Learning Techniques to Forecast Malaria Incidences in Kolkata of India

PDF (1276KB), PP.68-83

Views: 0 Downloads: 0

Author(s)

Krishnendra Sankar Ganguly 1 Krishna Sankar Ganguly 2 Ambar Dutta 3

1. Ernst & Young GDS, Kolkata, India

2. The Kolkata Municipal Corporation, Kolkata, India

3. Amity Institute of Information Technology, Amity University, Kolkata, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2025.05.06

Received: 25 Apr. 2025 / Revised: 19 Jun. 2025 / Accepted: 14 Aug. 2025 / Published: 8 Oct. 2025

Index Terms

Computational Epidemiology, Public Health, Machine Learning, Deep Learning, Real-time Outbreak and Disease Surveillance System, LSTM, GRU, SARIMA, RF Regressor, Non-linear SVM Regressor

Abstract

To augment the accuracy of the results of a Time-Series Forecasting problem in the Computational Epidemiology domain of Public Health, to generate an accurate alert in a Real-time Outbreak and Disease Surveillance (RODS) system, namely in the prediction of Malaria incidences, an interdisciplinary approach of data analysis [through Statistical along with Machine-Learning (ML) and Deep-Learning techniques (DL)] has been studied in this research. Two different Non-linear Deep-Learning based techniques, viz., Long Short-Term Memory (LSTM) [a subclass of Recurrent Neural Network (RNN)] & Gated Recurrent Unit (GRU) and two different Non-linear Machine-Learning techniques, viz., Random Forest Regressor & Non-linear Support Vector Machine Regressor are applied in this study to compare against the traditional Statistical-based linear SARIMA model, to forecast a longitudinal data-set of malaria incidences. While SARIMA or other traditional Autoregressive (AR) models, necessitating a smaller number of parameters, undergo limited training and limited prediction power, ML and DL models show profound and persistent performance improvement with better noise-handling/ missing values and perform multi-step forecasts. Moreover, the over-fitting issue can be combated by introducing densely connected residual links in the ML/ DL networks.

Cite This Paper

Krishnendra Sankar Ganguly, Krishna Sankar Ganguly, Ambar Dutta, "A Comparative Study of Statistical (SARIMA) Vis-À-Vis Some Traditional Machine-Learning and Deep-Learning Techniques to Forecast Malaria Incidences in Kolkata of India", International Journal of Information Technology and Computer Science(IJITCS), Vol.17, No.5, pp.68-83, 2025. DOI:10.5815/ijitcs.2025.05.06

Reference

[1]W. O. Kermack and A. G. McKendrick, “A contribution to the mathematical theory of epidemics,” in Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 115, pp. 700–721, 1927.
[2]C. Chatfield, “The Analysis of Time Series: An Introduction,” 6th ed., Chapman & Hall, New York, pp. 352, 2003.
[3]K. P. Burnham and D. R. Anderson, “Model selection and multimodel inference: A practical information-theoretic approach,” 2nd ed., Springer-Verlag, New York, 2002.
[4]R. Senanayake, S. T. O’Callaghan, and F. Ramos, “Predicting spatio-temporal propagation of seasonal influenza using variational Gaussian process regression,” in AAAI, pp. 3901–3907, 2016.
[5]A. Graves, “Supervised sequence labelling with recurrent neural networks,” Studies in Computational Intelligence, Springer, 2012.
[6]S. Chae, S. Kwon, and D. Lee, “Predicting infectious disease using deep learning and big data,” International Journal of Environmental Research and Public Health, vol. 15, no. 8, pp. 1596, 2018.
[7]K. Lee, J. Ray, and C. Safta, “The predictive skill of convolutional neural networks models for disease forecasting,” PLOS ONE, vol. 16, no. 7, pp. e0254319, 2021.
[8]S. Dixon, R. Keshava Murthy, D. H. Farber, A. Stevens, K. Pazdernik, and L. E. Charles, “A comparison of infectious disease forecasting methods across locations, diseases and time,” Pathogens, vol. 11, no. 2, pp. 185, 2022.
[9]G. S. Bhavekar, A. Das Goswami, C. P. Vasantrao, A. K. Gaikwad, A. V. Zade, and H. Vyawahare, “Heart disease prediction using machine learning, deep learning and optimization techniques - A semantic review,” Multimedia Tools and Applications, 2024.
[10]Z. Yu, K. Wang, Z. Wan, S. Xie, and Z. Lv, “Popular deep learning algorithms for disease prediction: A review,” Cluster Computing, vol. 26, pp. 1231–1251, 2023.
[11]C. Burghard, “Big data and analytics key to accountable care success,” IDC Health Insights, 2012.
[12]G. P. Kanna, S. J. Kumar, Y. Kumar, A. Changela, M. Woźniak, J. Shafi, and M. F. Ijaz, “Advanced deep learning techniques for early disease prediction in cauliflower plants,” Scientific Reports, vol. 13, no. 18475, 2023.
[13]W. Raghupathi, “Data mining in health care,” in Healthcare Informatics: Improving Efficiency and Productivity, S. Kudyba, Ed., Taylor & Francis, pp. 211–223, 2010.
[14]P. Ravikumaran and K. Vimala Devi, “A review: big data and analytics in health care,” Indian Journal of Engineering, vol. 13, no. 31, pp. 1–10, 2016.
[15]W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential,” Health Information Science and System, vol. 2, no. 3, pp. 10, 2014.
[16]D. J. Park, M. W. Park, H. Lee, Y. J. Kim, Y. Kim, and Y. H. Park, “Development of machine learning model for diagnostic disease prediction based on laboratory tests,” Scientific Reports, vol. 11, no. 1, pp. 7567, 2021.
[17]R. Kaundal, A. S. Kapoor, and G. P. Raghava, “Machine learning techniques in disease forecasting: A case study on rice blast prediction,” BMC Bioinformatics, vol. 7, pp. 485, 2006.
[18]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
[19]I. Sutskever, “Training recurrent neural networks,” PhD thesis, University of Toronto, 2013.
[20]E. Busseti, I. Osband, and S. Wong, “Deep learning for time series modeling,” Technical report, Stanford University, 2012.
[21]J. Martens and I. Sutskever, “Learning recurrent neural networks with hessian-free optimization,” in Proceedings of the 28th International Conference on Machine Learning, pp. 2011.
[22]W. K. Wong, M. Xia, and W. C. Chu, “Adaptive neural network model for time-series forecasting,” European Journal of Operational Research, vol. 207, no. 2, pp. 807–816, 2010.
[23]S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[24]Q. Lyu and J. Zhu, “Revisit long short-term memory: An optimization perspective,” in Advances in Neural Information Processing Systems Workshop on Deep Learning and Representation Learning, 2014.
[25]R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, no. 2, pp. 270–280, 1989.
[26]R. Sharma, “Epidemiological investigation of malaria outbreak in village Santej, district Gandhi Nagar (Gujarat),” Indian Journal of Preventive and Social Medicine, vol. 37, no. 3&4, 2006.
[27]O. P. Zacarias and H. Boström, “Predicting the incidence of malaria cases in Mozambique using regression trees and forests,” International Journal of Computer Science and Electronics Engineering, vol. 1, no. 1, 2013.
[28]F. Kamalov, A. K. Cherukuri, H. Sulieman, F. Thabtah, and A. Hossain, “Machine learning applications for COVID-19: a state-of-the-art review,” in Data Science for Genomics, A. K. Tyagi and A. Abraham, Eds., Academic Press, pp. 277–289, 2023.
[29]J. C. Clement, V. Ponnusamy, K. C. Sriharipriya, and R. Nandakumar, “A survey on mathematical, machine learning and deep learning models for COVID-19 transmission and diagnosis,” IEEE Reviews in Biomedical Engineering, vol. 15, pp. 325–340, 2022.
[30]C. Comito and C. Pizzuti, “Artificial intelligence for forecasting and diagnosing COVID-19 pandemic: A focused review,” Artificial Intelligence in Medicine, vol. 128, pp. 102286, 2022.
[31]J. Devaraj, R. Madurai Elavarasan, R. Pugazhendhi, G. M. Shafiullah, S. Ganesan, A. K. Jeysree, et al., “Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant?” Results in Physics, vol. 21, pp. 103817, 2021.
[32]M. Masum, M. A. Masud, M. I. Adnan, H. Shahriar, and S. Kim, “Comparative study of a mathematical epidemic model, statistical modeling, and deep learning for COVID-19 forecasting and management,” Socio-Economic Planning Sciences, vol. 80, pp. 101249, 2022.
[33]R. Chandra, A. Jain, and D. S. Chauhan, “Deep learning via LSTM models for COVID-19 infection forecasting in India,” PLoS One, vol. 17, no. 1, pp. e0262708, 2022.
[34]A. H. Elsheikh, A. I. Saba, M. A. Elaziz, S. Lu, S. Shanmugan, T. Muthuramalingam, et al., “Deep learning-based forecasting model for COVID-19 outbreak in Saudi Arabia,” Process Safety and Environmental Protection, vol. 149, pp. 223–233, 2021.
[35]S. Shastri, K. Singh, S. Kumar, P. Kour, and V. Mansotra, “Time series forecasting of COVID-19 using deep learning models: India-USA comparative case study,” Chaos, Solitons & Fractals, vol. 140, pp. 110227, 2020.
[36]M. B. Braga, R. S. Fernandes, G. N. Souza Jr., J. E. Rocha Jr., C. J. F. Dolácio, and I. S. Tavares Jr., et al., “Artificial neural networks for short-term forecasting of cases, deaths, and hospital beds occupancy in the COVID-19 pandemic at the Brazilian Amazon,” PLoS One, vol. 16, no. 3, pp. e0248161, 2021.
[37]F. Cribari-Neto, “A beta regression analysis of COVID-19 mortality in Brazil,” Infectious Disease Modelling, vol. 8, no. 2, pp. 309–317, 2023.
[38]R. G. da Silva, M. H. D. M. Ribeiro, V. C. Mariani, and L. S. Coelho, “Forecasting Brazilian and American COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables,” Chaos, Solitons & Fractals, vol. 139, pp. 110027, 2020.
[39]M. Marzouk, N. Elshaboury, A. Abdel-Latif, and S. Azab, “Deep learning model for forecasting COVID-19 outbreak in Egypt,” Process Safety and Environmental Protection, vol. 153, pp. 363–375, 2021.
[40]M. O. Alassafi, M. Jarrah, and R. Alotaibi, “Time series predicting of COVID-19 based on deep learning,” Neurocomputing, vol. 468, pp. 335–344, 2022.
[41]İ. Kırbaş, A. Sözen, A. D. Tuncer, and F. Ş. Kazancıoğlu, “Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN, and LSTM approaches,” Chaos, Solitons & Fractals, vol. 138, pp. 110015, 2020.
[42]N. Ayoobi, D. Sharifrazi, R. Alizadehsani, A. Shoeibi, J. M. Gorriz, H. Moosaei, et al., “Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods,” Results in Physics, vol. 27, pp. 104495, 2021.
[43]S. Sah, A. Kamerkar, B. Surendiran, and R. Dhanalakshmi, “Predicting the trends of COVID-19 cases using LSTM, GRU, and RNN in India,” in Intelligent Data Engineering and Analytics, Singapore: Springer Nature Singapore, 2022.
[44]A. Zeroual, F. Harrou, A. Dairi, and Y. Sun, “Deep learning methods for forecasting COVID-19 time-series data: A comparative study,” Chaos, Solitons & Fractals, vol. 140, pp. 110121, 2020.
[45]E. Y. Cramer, E. L. Ray, V. K. Lopez, J. Bracher, A. Brennen, A. J. Castro Rivadeneira, et al., “Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States,” Proceedings of the National Academy of Sciences, vol. 119, no. 15, pp. e2113561119, 2022.
[46]H. M. Scobie, A. G. Johnson, A. B. Suthar, R. Severson, N. B. Alden, S. Balter, et al., “Monitoring incidence of COVID-19 cases, hospitalizations, and deaths, by vaccination status—13 U.S. jurisdictions, April 4–July 17, 2021,” MMWR Morbidity and Mortality Weekly Report, vol. 70, no. 37, pp. 1284–1290, 2021.
[47]K. Sreehari, M. Adham, T. D. Cheriya, and R. Sheik, “A comparative study between univariate and multivariate time series models for COVID-19 forecasting,” in 2021 International Conference on Computational Performance Evaluation (ComPE), pp. 2021.
[48]M. Sadarangani, B. Abu Raya, J. M. Conway, S. A. Iyaniwura, R. C. Falcao, C. Colijn, et al., “Importance of COVID-19 vaccine efficacy in older age groups,” Vaccine, vol. 39, no. 15, pp. 2020–2023, 2021.
[49]K. R. Bhimala, G. K. Patra, R. Mopuri, and S. R. Mutheneni, “Prediction of COVID-19 cases using the weather-integrated deep learning approach for India,” Transboundary and Emerging Diseases, vol. 69, no. 3, pp. 1349–1363, 2022.
[50]F. Kamalov, K. Rajab, A. K. Cherukuri, A. Elnagar, and M. Safaraliev, “Deep learning for COVID-19 forecasting: State-of-the-art review,” Neurocomputing, vol. 511, pp. 142–154, 2022.
[51]L. Xu, R. Magar, and A. Barati Farimani, “Forecasting COVID-19 new cases using deep learning methods,” Computers in Biology and Medicine, vol. 144, pp. 105342, 2022.
[52]“Calcutta: not ‘the city of joy’,” Gaia: Environmental Information System. Archived from the original on April 27, 2006. Accessed April 26, 2006.
[53]“Weather base entry for Kolkata,” Canty and Associates LLC. Archived from the original on September 7, 2011. Accessed April 26, 2006.
[54]A. Dembosky, “Data prescription for better healthcare,” Financial Times, p. 19, December 12, 2012. Available: http://www.ft.com/intl/cms/s/2/55cbca5a-4333-11e2-aa8f-00144feabdc0.html#axzz2W9cuwajK.
[55]B. Feldman, E. M. Martin, and T. Skotnes, “Big data in healthcare: Hype and hope,” Dr. Bonnie 360, October 2012. Available: http://www.west-info.eu/files/big-data-in-healthcare.pdf.
[56]M. Fernandes, M. O’Connor, and V. Weaver, “Big data, bigger outcomes,” Journal of AHIMA, vol. 83, pp. 38–42, 2012.
[57]M. Kendall and A. Stuart, “The advanced theory of statistics,” vol. 3, Griffin, pp. 410–414, 1983.
[58]K. S. Ganguly, S. Modak, A. K. Chattopadhyay, K. S. Ganguly, T. K. Mukherjee, A. Dutta, and D. Biswas, “Forecasting based on a SARIMA model of urban malaria for Kolkata,” American Journal of Epidemiology and Infectious Disease, vol. 4, no. 2, pp. 22–33, 2016.
[59]K. S. Ganguly, S. Modak, K. S. Ganguly, and A. K. Chattopadhyay, “Study on temporal effects of urban malaria incidences,” International Journal of Statistics in Medical Research, vol. 5, pp. 120–132, 2016.
[60]K. Aho, D. Derryberry, and T. Peterson, “Model selection for ecologists: The worldviews of AIC and BIC,” Ecology, vol. 95, pp. 631–636, 2014.
[61]K. P. Burnham and D. R. Anderson, “Multimodel inference: Understanding AIC and BIC in model selection,” Sociological Methods & Research, vol. 33, pp. 261–304, 2004.
[62]A. Kolmogorov, “Sulla determinazione empirica di una legge di distribuzione,” Giornale dell’Istituto Italiano degli Attuari, vol. 4, pp. 83–91, 1933.
[63]N. Smirnov, “Table for estimating the goodness of fit of empirical distributions,” Annals of Mathematical Statistics, vol. 19, pp. 279–281, 1948.
[64]G. M. Ljung and G. E. P. Box, “On a measure of lack of fit in time series models,” Biometrika, vol. 65, no. 2, pp. 297–303, 1978.