Automated Forecasting Approach Minimizing Prediction Errors of CPU Availability in Distributed Computing Systems

Full Text (PDF, 955KB), PP.8-21

Views: 0 Downloads: 0


N. Chabbah Sekma 1,* A. Elleuch 2 N. Dridi 2

1. National Engineering School of Tunis, University of Tunis El Manar, Tunis, 1002, Tunisia

2. National School of Computer Sciences, University of Manouba, Manouba, 2010, Tunisia

* Corresponding author.


Received: 17 Jan. 2016 / Revised: 10 Mar. 2016 / Accepted: 11 May 2016 / Published: 8 Sep. 2016

Index Terms

CPU availability prediction, prediction system, multivariate time series, multi-state based prediction, volunteer computing system


Forecasting CPU availability in volunteer computing systems using a single prediction algorithm is insufficient due to the diversity of the world-wide distributed resources. In this paper, we draw-up the main guidelines to develop an appropriate CPU availability prediction system for such computing infrastructures. To reduce solution time and to enhance precision, we use simple prediction techniques, precisely vector autoregressive models and a tendency-based technique. We propose a predictor construction process which automatically checks assumptions of vector autoregressive models in time series. Three different past analyses are performed. For a given volunteer resource, the proposed prediction system selects the appropriate predictor using the multi-state based prediction technique. Then, it uses the selected predictor to forecast CPU availability indicators. We evaluated our prediction system using real traces of more than 226000 hosts of Seti@home. We found that the proposed prediction system improves the prediction accuracy by around 24%.

Cite This Paper

N. Chabbah Sekma, A. Elleuch, N. Dridi, "Automated Forecasting Approach Minimizing Prediction Errors of CPU Availability in Distributed Computing Systems", International Journal of Intelligent Systems and Applications (IJISA), Vol.8, No.9, pp.8-21, 2016. DOI:10.5815/ijisa.2016.09.02


[1]D. Kondo, A. Andrzejak and D. P. Anderson, “On Correlated Availability in Internet-distributed Systems”, Proceedings of the 9th IEEE/ACM International Conference on Grid Computing, Tsukuba, Japan, pp. 276-283, 2008.
[2]B. Javadi, D. Kondo, J.M. Vincent, and D. P. Anderson, “Discovering Statistical Models of Availability in Large Distributed Systems: an Empirical Study of SETI@home”, IEEE Transactions on Parallel & Distributed Systems, IEEE Computer Society 2011, vol. 22, no. 11, pp. 1896-1903, 2011.
[3]R. Wolski, N.T. Spring, and J. Hayes, “The Network Weather Service: a Distributed Resource Performance Forecasting System for Metacomputing”, Journal of Future Generation Computing Systems, Elsevier Science Publishers B. V., vol. 15, No. 5-6, pp. 757-768, 1999.
[4]J. Liang, J. Cao, J. Wang, and Y. Xu, “Long-term CPU Load Prediction”. In Proceedings of the 9th IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC ’11), Sydney, NSW, pp. 23–26, 2011.
[5]A. Amin, L. Grunske, and A. Colman, “An automated Approach to Forecasting QoS Attributes Based on Linear and Non-linear Time Series Modeling”, Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE), Essen, Germany, pp.130-139, 2012.
[6]S. Rubab, M. F. B. Hassan and A. K. B. Mahmood, “A Review on Resource Availability Prediction Methods in Volunteer Grid Computing”, IEEE International Conference on Control System, Computing and Engineering (ICCSCE), Penang, Malaysia, pp. 478-483, November 2014.
[7]N. Chabbah Sekma, A. Elleuch and N. Dridi,“Cross-correlation Analyses Toward a Prediction System of CPU Availability in Volunteer Computing System”, in the IEEE International Conference on Industrial Engineering and Systems Management (IESM15), Seville, Spain, pp. 184-192, October 2015.
[8]Jian Zhang and Renato J. Figeiredo, “Learning-aided Predictor Integration for System Performance Prediction”, Journal of Cluster Computing, Springer US, vol. 10, no. 4, pp. 425—442, 2007.
[9]N. Chabbah Sekma, A. Elleuch and N. Dridi,“Prediction of CPU Availability in Volunteer Computing Systems using Multivariate Time Series Modeling”, in the 45th International Conference on Computers and Industrial Engineering (CIE45), Metz, France, in press, October 2015.
[10]J. Liang, K. Nahrstedt, and Y. Zhou, “Adaptive Multi-ressource Prediction in Distributed Resource Sharing Environment” In IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2004), pp. 293-300, 2004.
[11]K. B. Bey, F. Benhammadi, Z. Gessoum and A. Mokhtari, “CPU Load Prediction using Neuro-fuzzy and Bayesian Inferences”, Neurocomputing, Elsevier, vol. 74, no. 10, pp. 1606—1616, 2011.
[12]H. Lütkepohl, Introduction to Multiple Time Series Analysis, 1st ed. Berlin, Germany: Springer Publishing Company, Incorporated, Springer Berlin Heidelberg; 2005.
[13]P. A. Dinda, and D. R. O’Hallaron, “Host Load Prediction Using Linear Models”. Journal of Cluster Computing, Kluwer Academic Publishers, vol. 3, no. 4, pp. 265-280, 2000.
[14]G.E.P. Box and G. Jenkins, Time Series Analysis, Forecasting and Control. 1st ed. San Francisco: Holden-Day, Incorporated, 1976.
[15]S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy, “An Analysis of Internet Content Delivery Systems”. SIGOPS Operating Systems Review [OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation, 2002], ACM, vol. 36, no. SI, pp. 315-327. December 2002.
[16]J. Douceur, “Is Remote Host Availability Governed by a Universal Law?”, SIGMETRICS Performance Evaluation Review, ACM, vol. 31, no. 3, pp. 25–29, December 2003.
[17]R. Bhagwan, S. Savage and G.M. Voelker, “Understanding Availability”. In: Proceedings of the 2nd IPTPS, Berkeley, California, pp. 256-267.
[18]J. R. Douceur and R. Wattenhofer, “Optimizing File Availability in a Secure Serverless Distributed File System”. In: Proceedings of 20th Symposium on Reliable Distributed Systems (SRDS), New Orleans, LA, pp. 4-13, October 2001.
[19]D. Nurmi, J. Brevik and R. Wolski, “Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments”. In: Proceedings of 11th International Euro-Par Parallel Processing, Lisbon, Portugal, pp. 432-441, 2005.
[20]A. Benoit, Y. Robert, A. Rosenberg and F. Vivien, “Static Strategies for Worksharing with Unrecoverable Interruptions”. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS 2009), Rome, Italy, pp. 1-12, May 2009.
[21]J. D. Sonnek, M. Nathan, A. Chandra, and J. B. Weissman, “Reputation Based Scheduling on Unreliable Distributed Infrastructures”. In: 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), Lisboa, Portugal, pp. 30, July 2006.
[22]A. Andrzejak, D. Kondo and D. P. Anderson, “Ensuring Collective Availability in Volatile Resource Pools via Forecasting”. In: 19th IEEE/IFIP Distributed Systems: Operations and Management (DSOM-2008), Samos Island, Greece, pp. 149-161, September 2008.
[23]B. Javadi, K. Matawie and D. P. Anderson, “Modeling and Analysis of Resources Availability inVolunteer Computing Systems”, In: IEEE 32nd International Performance Computing and Communications Conference (IPCCC), San Diego, CA, pp. 1-9, ), December 2013.
[24]L. Yang, I. Foster, and J. Schopf, “Homeostatic and Tendency-Based CPU Load Prediction”, In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, pp. 42-50, April 2003.
[25]Y. Zhang, W. Sun and Y. Inoguchi, “Predict Task Running Time in Grid Environments Based on CPU Load Predictions”, Journal of Future Generation Computer Systems, Springer, vol. 24, no. 6, pp. 489-497, June 2008.
[26]A. Andrzejak, P. Domingues, and L. Silva, “Predicting Machine Availabilities in Desktop Pools”. In: 10th IEEE/IFIP Network Operations and Management Symposium (NOMS 2006), Vancouver, Canada, pp. 1-4, April 2006.
[27]H. Prem and N. R. S. Raghavan, “A Support Vector Machine Based Approach for Forecasting of Network Weather Services”. Journal of Grid Computing, Springer, vol. 4, no. 1, pp. 89–114, March 2006.
[28]Z. Li, C. Wang, H. Lv and T. Xu, “Research on CPU Workload Prediction and Balancing in Cloud Environment”, International Journal of Hybrid Information Technology, vol. 8, no. 2, pp. 159-172, 2015.
[29]J. W. Mickens and B. D. Noble, “Exploiting Availability Prediction in Distributed Systems”, In: Proceedings of the 3rd conference on Networked Systems Design & Implementation (NSDI'06), San Jose, CA, pp. 6, May 2006.
[30]X. Ren, S. Lee, R. Eigenmann, and S. Bagchi, “Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical Evaluation”, Journal of Grid Computing, Springer, vol. 5, no. 2, pp. 173–195, September 2007.
[31]B. Rood and M. Lewis, “Grid Resource Availability Prediction-Based Scheduling and Task Replication”, Journal Grid Computing, Springer, Dordrecht, vol. 7, no. 4, pp. 479-500, 2009.
[32]R. E. Maleki, A. Mohammadkhan, H. Y. Yeom and A. Movaghar, “Combined Performance and Availability Analysis of Distributed Resources in Grid Computing”, Journal of Supercomputing, Kluwer Academic Publishers, vol. 69, no. 2, pp. 827-844, August 2014.
[33]G. M. Ljung and G. E. P. Box, “On a Measure of a Lack of Fit in Time Series Models”, Biometrika, vol. 65, no. 2, pp. 297–303, 1978.
[34]D. Kwiatkowski, P.C.B. Philips, P. Schmidt and Y. Shin, “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root”, Journal of Econometrics, vol. 54, no. 1-3, pp. 159—178, December 1992.
[35]C. W. J. Granger, “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods”, Econometrica, vol. 37, no. 3, pp. 424-438, 1969.
[36]H. Akaike, “A Bayesian Analysis of the Minimum AIC Procedure”, Annals of the Institute of Statistical Mathematics, vol. 30, pp. 9-14, 1978.
[37]D. P. Anderson, “BOINC: A System for Public-Resource Computing and Storage”. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing (GRID '04), Pittsburgh, USA, p. 4-10, November 2004.
[38]Arabi E. keshk, Ashraf B. El-Sisi, Medhat A. Tawfeek, “Cloud Task Scheduling for Load Balancing based on Intelligent Strategy”, IJISA, vol.6, no. 5, pp. 25-36, April 2014.
[39]Failure Trace Archive (FTA). INRIA in the context of the ALEAE project., February 2016.
[40]Gnu Regression, Econometrics and Time-series Library (GRETL)., February 2016.