Username   Password       Forgot your password?  Forgot your username? 


Performability Analysis of a Parallel Service Considering Multiple Types of Failures

Volume 13, Number 3, May 2017 - SC 71 - pp. 330-333
DOI: 10.23940/ijpe.17.03.p9.330333

Shengji Yu and Xiwei Qiu

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

(Submitted on January 22, 2017; First Revised on March 1, 2017; Final Rivised on April 18, 2017; Accepted on April 19, 2017)


Parallel computing is an important approach to achieve a high throughput of serving user requests, which has significant influence on improving performance. Parallel computing service can be realized by hosting multiple copies of the software that performs the same service tasks on different physical machines running in parallel. However, the execution of the software may be interrupted by various kinds of failures, including software failures, hardware failures, and common cause failures (CCF) of co-located copies of the software caused by the failures of the host machine. To analyze the performability of a parallel service, unexpected change of performance caused by random failures and subsequent process of recovery should be counted. This paper presents a theoretical modeling approach encompassing Markov reward models to analyze the performability of a parallel service, which considers software failures, hardware failures, and common cause failures to ensure high fidelity. Simulation results are illustrated to verify the new model.


References: 5

1. R. A. Kendall, E. Apra, D. E. Bernholdt, E. J. Bylaska, and M. Dupuis. “High Performance Computational Chemistry: An overview of NWChem a Distributed Parallel Application,” Computer Physics Communications, vol. 128, no. 1, pp. 260-283, 2000
2. J. F. Meyer, “On Evaluating the Performability of Degradable Computing Systems,” IEEE Transactions on Computer, vol. 100, no. 8, pp. 720-731, 1980
3. B. Yang, F. Tan, and Y. S. Dai, “Performance Evaluation of Cloud Service Considering Fault Recovery,” Journal of Supercomputing, vol. 65, no. 1, pp. 426-444, 2013
4. K. S. Trivedi, “Probability and Statistics With Reliability, Queuing, and Computer Science Applications,” Wiley, New York, 2001
5. X. Ni, J. Zhao, W. Song, and H. Li, “Reliability Modeling for Two-Stage Degraded System Based on Cumulative Damage Model,” International Journal of Performability Engineering, vol. 12, no. 1, pp. 89-94, 2016


Click here to download the paper.

Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

This site uses encryption for transmitting your passwords.