Ranking Grid-sites based on their Reliability for Successfully Executing Jobs of Given Durations

Full Text (PDF, 767KB), PP.9-15

Views: 0 Downloads: 0


Farrukh Nadeem 1,*

1. Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2015.05.02

Received: 6 Oct. 2014 / Revised: 2 Jan. 2015 / Accepted: 11 Feb. 2015 / Published: 8 Apr. 2015

Index Terms

The Grid, Grid-site availability, Grid-site re?liability, Job success rate, Reliability modeling


Today’s Grids include resources (referred as Grid-site s) from different domains including dedicated production resources, resources from university labs, and even P2P en?vironments. Grid high level services, like schedulers, resource managers, etc. need to know the reliability of the available Grid-sites to select the most suitable from them. Modeling reliability of a Grid-site for successful execution of a job requires prediction of Grid-site availability for the given duration of job execution as well as possibility of successful execution of the job. Predicting Grid-site availability is complex due to different availability patterns, resource sharing policies implemented by resource owners, nature of domain the resource belongs to (e.g. P2P etc.), and its maintenance etc. To give a solution, we model reliability of Grid-site in terms of prediction of its availability and possibility of job success. Our availability predictions incorporate past patterns of the Grid-site availability using pattern recognition methods. To estimate possibility of job success, we consider historical traces of job execution. The experiments conducted on a trace of real Grid demonstrate the effectiveness of our approach for ranking Grid-sites based on their reliability for executing jobs successfully.

Cite This Paper

Farrukh Nadeem,"Ranking Grid-sites based on their Reliability for Successfully Executing Jobs of Given Durations", International Journal of Computer Network and Information Security(IJCNIS), vol.7, no.5, pp.9-15, 2015. DOI:10.5815/ijcnis.2015.05.02


[1]Waseem Ahmed and Yong Wei Wu. A survey on reliability in distributed systems. Journal of Computer and System Sciences, 79(8):1243–1255, 2013.
[2]ALADDIN?G5K, INRI. Grid’5000 project, 2014. https://www.grid5000.fr/.
[3]Raid Alsoghayer and Karim Djemame. Resource failures risk assessment modelling in distributed environments. Journal of Systems and Software, 88(0):42 – 53, 2014.
[4]David P. Anderson. Boinc: A system for public resource computing and storage. In IEEE/ACM International Workshop on Grid Computing, Washington, DC, USA, 2004.
[5]Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson, Mike Wawrzoniak, and Mic Bowman. Planet lab: An overlay tested for broad coverage services. SIG?COMM Computation and Communication Review, 33(3):3–12, July 2003.
[6]Yuan Shun Dai and G. Levitin. Reliability and performance of tree structured grid services. IEEE Transactions on Reliability, 55(2):337–349, June 2006.
[7]Yuan Shun Dai and Gregory Levitin. Optimal resource allocation for maximizing performance and reliability in tree structured grid services. IEEE Transactions on Reliability, 56(3):444–453, Sept 2007.
[8]Yuan Shun Dai, Gregory Levitin, and Kishor S. Trivedi. Performance and reliability of tree structured grid services considering data dependence and failure correlation. IEEE Transactions on Computers, 56(7):925–936, July 2007.
[9]Yuan Shun Dai, Yi Pan, and Xukai Zou. A hierarchical modeling and analysis for grid service reliability. IEEE Transactions on Computers, 56(5):681–691, May 2007.
[10]Manal Dakil, Christophe Simon, and Taha Boukhobza. Connectivity condition for structural properties using a graph theoretical approach: probabilistic reliability assessment. System, Structure and Control, 5(1):72–77, 2013.
[11]Ozge Doguc and Jose Emmanuel Ramirez Marquez. An automated method for estimating reliability of grid systems using Bayesian networks. Reliability Engineering & System Safety, 104(0):96 – 105, 2012.
[12]Ozge Doguc and Jose Emmanuel Ramirez Marquez. A generic method for estimating system reliability using Bayesian networks. Reliability Engineering & System Safety, 94(2):542 – 550, 2009.
[13]EGEE consortium. Enabling Grids for science (EGEE), 2010. http://euegeeorg.web.cern.ch/euegeeorg/index.html.
[14]Mohana Farzin, Poorya Khodabande, and Hadi Toofani. Reli?ability and latency calculation in grid computing systems. In 5th International Conference on Application of Information and Communication Technologies (AICT), pages 1–6, Oct 2011.
[15]Ehsan. Gholami, Amir Masood Rahmani, and Ahmad Habibizad Navin. Using Monte Carlo simulation in grid computing systems for reliability estimation. In Eighth International Conference on Networks, 2009 (ICN ’09), pages 380–384, March 2009.
[16]Ehsan Gholami, Amir Masoud Rahmani, and Reza Farshidi. Using simulated annealing to improve reliability of grid computing systems. In The Fourth International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP 2010), pages 17–22, 2010.
[17]Katerina Goňáseva Popstojanova and Kishor S Trivedi. Architecture based approaches to software reliability prediction. Computers & Mathematics with Applications, 46(7):1023 – 1036, 2003.
[18]Bj?rn Axel Gran and Atte Hemline. A Bayesian belief network for reliability assessment. In Computer Safety, Reliability and Security, pages 35–45. Springer, 2001.
[19]Suchang Guo, Hong Zhong Huang, Zhonglai Wang, and Min Xie. Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Transactions on Reliability, 60(1):263–274, 2011.
[20]Juergen Hofer and Thomas Fahringer. A multiperspective taxonomy for systematic classi?cation of grid faults. In Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network Based Processing, PDP’08, pages 126–130, Washington, DC, USA, 2008. IEEE Computer Society.
[21]Masnida Hussin, Nor Asilah Wati Abdul Hamid, and Khairul Azhar Kasmiran. Improving reliability in resource management through adaptive reinforcement learning for distributed systems. Journal of Parallel and Distributed Computing, 31(10):60–68, 2014.
[22]Institute of Physical and Theoretical Chemistry, TU Vienna. WIEN2k: An Augmented Plane Wave plus Local Orbitals Program for Calculating Crystal Properties. http://www.wien2k. at/, 2014.
[23]Ipneet Kaur. Estimating Grid Reliability Using Bayesian Networks. PhD thesis, THAPAR UNIVERSITY, 2011.
[24]V K P Kumar, S Hariri, and C S Raghavendra. Distributed program reliability analysis. IEEE Transactions on Software Engineering, 12(1):42–50, January 1986.
[25]Way Kuo and V.R. Prasad. An annotated overview of system reliability optimization. Reliability, IEEE Transactions on, 49(2):176–187, Jun 2000.
[26]Gregory Levitin, Yuan Shun Dai, and Hanoch Blenheim. Reliability and performance of star topology grid service with precedence constraints on subtask execution. IEEE Transactions on Reliability, 55(3):507–515, Sept 2006.
[27]Farrukh Nadeem and Thomas Fahringer. Predicting the execution time of grid work?ow applications through local learning. In Proceedings of the Conference on High Perfor-mance Computing Networking, Storage and Analysis, SC ’09, pages 1–12, New York, NY, USA, Nov 2009. ACM.
[28]Farrukh Nadeem and Thomas Fahringer. Optimizing execution time predictions of scienti?c work?ow applications in the grid through evolutionary programming. Future Gener. Comput. Syst., 29(4):926–935, June 2013.
[29]Farrukh Nadeem, Radu Prodan, and Thomas Fahringer. Optimizing performance of automatic training phase for application performance prediction in the grid. In In Proceedings of Third International Conference on High Performance Computing and Communications, HPCC’2007, pages 309– 321, Houston, USA, September 26-28 2007. Springer.
[30]Farrukh Nadeem, Radu Prodan, and Thomas Fahringer. Characterizing, modeling and predicting dynamic resource availability in a large scale multipurpose grid. In 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), pages 348–357. IEEE Computer Society, 2008.
[31]Farrukh Nadeem, Radu Prodan, Thomas Fahringer, and Vincent Keller. An evaluation of availability comparison and prediction for optimized resource selection in the grid. In From Grids to Service and Pervasive Computing, pages 63–76. Springer US, 2008.
[32]Farrukh Nadeem, Muhammad Murtaza Yousaf, Radu Prodan, and Thomas Fahringer. Soft benchmarks based application performance prediction using a minimum training set. In E-SCIENCE ’06: Proceedings of the Second IEEE International Conference on science and Grid Computing, page 71, Amsterdam, Netherlands, December 2006.
[33]National Science Foundation. The TeraGrid Project, 2010. http://www.teragrid.org/.
[34]OSG Consortium. Open Science Grid, 2014. http://www.opensciencegrid.org/.
[35]Felix Schueller, Jun Qin, Farrukh Nadeem, Radu Prodan, Thomas Fahringer, and Georg Mayr. Performance, Scalability and Quality of the Meteorological Grid Work?ow MeteoAG. In 2nd Austrian Grid Symposium, Innsbruck, Austria. OCG Verlag, September 21-23 2006.
[36]Daniel P. Siewiorek and Robert S. Swarz. Reliable Computer Systems: Design and Evaluation (3rd Ed.). A K Peters, Ltd., Natick, MA, USA, 1998.
[37]Dieter Theiner and Marek Wieczorek. Reduction of calibra-tion time of distributed hydrological models by use of grid computing and nonlinear optimization algorithms. In Proceed?ings of the 7th International Conference on Hydro informatics (HIC 2006), September 2006.
[38]Min Xie, Yuan Shun Dai, and Kim Leng Poh. Computing System Reliability: Models and Analysis. Springer US, 2004.