Availability Improvement Model of (n,k) Cluster Systems using Software Rejuvenation

소프트웨어 재활기법을 적용한 (n,k) 클러스터 시스템의 가용도 향상 모델

  • 이재성 (텔슨전자(주) 개발그룹 소프트웨어팀) ;
  • 박기진 (안양대학교 소프트웨어학과) ;
  • 강창훈 (극동정보대학 멀티미디어과) ;
  • 박범주 (아주대학교 정보통신전문대학원) ;
  • 김성수 (아주대학교 정보통신전문대학원)
  • Published : 2003.06.01

Abstract

Internet-based computer systems have to provide both high-availability and high-performance. Cluster technology has been used to obtain availability and performance simultaneously Generally, high-availability cluster systems tolerate a failure of a cluster node and cost-effectively solve it. In this paper, we study availability and downtime cost of (n,k) cluster systems. By considering performance, we model state transition of (n,k) cluster systems and apply software rejuvenation technique to improve availability of the system. We find that software rejuvenation can be used to improve availability of (n,k) cluster systems.

인터넷 기반 시스템에서는 고가용도와 고성능을 제공해야하며, 클러스터 시스템 기술은 이에 대한 하나의 해결책으로 떠오르고 있다. 클러스터 시스템을 사용하는 중요한 목적은 성능과 가용도의 확보에 있으며, 고가용도 클러스터 시스템은 구성 노드들 중 일부에 결함이 발생했을 때 이를 비용ㆍ효율적으로 해결한다. 본 논문은 (n,k) 클러스터 시스템의 성능을 고려한 가용도 개선과 손실비용 분석에 관한 연구로 소프트웨어 재활 기법을 적용한 (n,k) 클러스터 시스템의 가용도 모델을 제안하였으며, 고가용도가 요청되는 시스템에서 소프트웨어 재활은 가용도 향상을 가져오는 유용한 기법 중의 하나임을 파악하였다.

Keywords

References

  1. I. Lee and R. Iyer, 'Software Dependability in the Tandem GUARDIAN System,' IEEE Transactions on Software Engineering, Vol. 21, No. 5, pp. 455-467, May 1995 https://doi.org/10.1109/32.387474
  2. G. Pfister. In Search of Clusters-' The Coming Battle in Lowly Parallel Computing. Prentice-Hall, NJ 1995, ISBN 0134376250
  3. J. Gray and D. Siewiorek, 'High-Availability Computer Systems,' IEEE Computer, Vol. 24, No. 9, pp. 39-48, September 1991 https://doi.org/10.1109/2.84898
  4. M. Sullivan and R. Chillarege, 'Software Defects and Their Impact on System Availability - A Study of Field Failures in Operating Systems,' IEEE International Symposium on Fault-Tolerant Computing, Vol. 21, No, 6, pp. 2-9, June 1991 https://doi.org/10.1109/FTCS.1991.146625
  5. Enterprise Computing, http://www.enterpriseweb.co.kr
  6. S. Garg, A. Puliafito, M. Telek and K. Trivedi, 'Analysis of software rejuvenation using Markov regenerative stochastic Petri net,' Proceedings of the Sixth International Symposium on Software Reliability Engineering, Vol. 6, No. 10, pp.180-187, October 24-27, 1995 https://doi.org/10.1109/ISSRE.1995.497656
  7. J. Han, H. Sun and H, Levendel, 'Availability Requirement for Fault Management Server,' Proceedings of the 25th Annual International Computer Software and Applications Conference, Vol. 25, No. 10, pp. 8-12, October 2001
  8. S. Garg, Y. Huang, C. Kintala and K. Trivedi, 'Time and Load Based Software Rejuvenation: Policy, Evaluation and Optimality,' Proceedings of the First Conference on Fault Tolerant Systems, Vol. 1, No. 12, pp. 22-25, December 1995
  9. S. Garg, A. Puliafito, M. Telek and K. Trivedi, 'Analysis of Software Rejuvenation Using Markov Regenerative Stochastic Petri Net,' Proceedings of the Sixth International Symposium on Software Reliability Engineering, Vol. 6, No. 10, pp. 180-187, October 1995 https://doi.org/10.1109/ISSRE.1995.497656
  10. S. Garg, A. Puliafito, M. Telek and K. Trivedi, 'On the Analysis of Software Rejuvenation Policies,' Annual Conference on Computer Assurance (COMPASS), Vol. 12, No 6, pp. 16-20, June 1997 https://doi.org/10.1109/CMPASS.1997.613248
  11. S. Garg, A. Puliafito, M. Telek and K. Trivedi, 'Analysis of Preventive Maintenance in Transactions Based Software Systems,' IEEE Transactions on Computers, Vol. 47, No. 1, pp. 96-107, January 1998 https://doi.org/10.1109/12.656092
  12. Y. Huang, C. Kintala, N. Kolettis and N. Fulton, 'Software Rejuvenation: Analysis, Module and Applications,' Proceedings of 25th IEEE Fault-Tolerant Computing Symposium, Vol. 25, No. 6, pp. 381-390, June 1995 https://doi.org/10.1109/FTCS.1995.466961
  13. S. Garg, Y. Huang, C. Kintala and K. Trivedi, 'Minimizing Completion Time of a Program by Checkpointing and Rejuvenation,' ACM SIGMETRICS Conference, pp. 252-261, May 1996 https://doi.org/10.1145/233013.233050
  14. A. Pfening, S. Garg, A. Puliafito, M. Telek and K. Trivedi, 'Optimal Rejuvenation for Tolerating Soft Failures,' 27th & 28th Performance Evaluation, Vol. 27-28, No, 10, pp. 491-506, October 1996 https://doi.org/10.1016/0166-5316(96)00038-7
  15. Y. Wang, Y. Huang, K. Vo, P. Chung and C. Kintala, 'Checkpointing and Its Applications,' Proceedings of 25th IEEE Fault-Tolerant Computing Symposium, Vol. 25, No. 1, pp. 22-31, June 1995 https://doi.org/10.1109/FTCS.1995.466999
  16. S. Garg, A. Moorsel, K. Vaidyanathan and K. Trivedi. 'A Methodology for Detection and Estimation of Software Aging,' Proceedings of 9th International Symposium on Software Reliability Engineering, Vol. 9, No. 11, pp. 282-292, November 1998 https://doi.org/10.1109/ISSRE.1998.730892
  17. J. Gray, 'Why Do Computers Stop and What Can Be Done About It?,' Proceedings of 5th Symposium on Reliability in Distributed Software and Database Systems, Vol. 5, No.l, pp. 3-12, January 1986
  18. E. Marshall, 'Fatal Error: How Patriot Overlooked a Scud,' Science, p. 1347, March 1992
  19. A. Tai, S. Chau, L. Alkalaj and H. Hecht, 'On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period,' Proceedings of 3rd International Workshop on Object-Oriented Real-time Dependable Systems, Vol. 3, No 11. pp. 26-27, February 1997
  20. 박기진, 김성수, 김재훈, '소프트웨어 재활 기법을 적용한 다중계 시스템의 가용도 분석', 한국정보과학회논문지(시스템및이론), 제27권, 제8호, pp. 730-740, 2000. 8
  21. 박기진, 김성수, '고가용도 Cold Standby 클러스터 시스템 성능 분석', 한국정보과학회논문지(시스템및이론), 제28권, 제3·4호, ,pp 173-180, 2001. 4
  22. 이재성, 박기진, 김성수, '소프트웨어 재활기법에 기반한 주-여분 서버 시스템의 작업전이 시간분석', 한국정보처리학회논문지, 한국정보처리학회, 제8-A권, 제2호, pp. 81-90, 2001. 6
  23. V. Mainkar, 'Availability Analysis of Transaction Processing Systems Based on User-Perceived Performance,' 16th Symposium on Reliable Distributed Systems, Vol. 16, No. 10, pp. 10-16, October 1997 https://doi.org/10.1109/RELDIS.1997.632791