DOI QR코드

DOI QR Code

Fault detection using heartbeat signal in the real-time distributed systems

실시간 분산 시스템에서 heartbeat 시그널을 이용한 장애 검출

  • 문원식 (평택대학교 융합소프트웨어학과)
  • Received : 2018.08.20
  • Accepted : 2018.09.11
  • Published : 2018.09.30

Abstract

Communication in real-time distributed system should have high reliability. To develop group communication Protocol with high reliability, potential fault should be known and when fault occurs, it should be detected and a necessary action should be taken. Existing detection method by Ack and Time-out is not proper for real time system due to load to Ack which is not received. Therefore, group communication messages from real-time distributed processing systems should be communicated to all receiving processors or ignored by the message itself. This paper can make be sure of transmission of reliable message and deadline by suggesting and experimenting fault detection technique applicable in the real time distributed system based on ring, and analyzing its results. The experiment showed that the shorter the cycle of the heartbeat signal, the shorter the time to propagate the fault detection, which is the time for other nodes to detect the failure of the node.

Keywords

References

  1. K. H. Kim and Chittur Subbaraman, "An Integration of the Primary-Shadow TMO Replication Scheme with a Supervisor-based Network Surveillance Scheme and its Recovery Time Bound Analysis," Proc. IEEE CS 27th Symp. on Reliavle Distributed Systems (SRDS '08), West Lafayette, IN, 2008.
  2. K. H. Kim and Chittur Subbaraman, Masaki Ishida, Jaqiang Liu, "TMO Support Library(TMOSL): Facilities for C++ TMO Programming," Univ. of California, Irvine, 2010.
  3. Y. Amir, G. Atenniese, D. Hasse, Y. Kim, "Dynamic Configuration Management in Reliable Distributed Real-Time Information Systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 1, 2009.
  4. L. Giuri, "Distinguishing Features and Potential Roles of the RTO.k Object Model," Proc. WORDS '04 (IEEE Computer Society Workshop on Object-oriented Real-Time Dependable Systems), Dana Point, Oct. 2004, pp.36-45.
  5. E. C. Lupu, M. S. Sloman, "Object Structures for Real-Time Systems and Simulators," IEEE Computer, August 2007, pp.62-70.
  6. K. P. Kihlstorm, L. E. Moser and P. M. Melliar-Smith, "Fault-Tolerant Real-Time Objects," Communications of the ACM, Vol. 40, No. 1, January 2007, pp.75-82. https://doi.org/10.1145/242857.242879
  7. O. Rodeh, K. Birman, and D. Dolev, "Action-Level Fault Tolerance" Ch 27 in Sang H. Son ed., 'Advances in Real-Time Systems,' Prentice Hall, 2004.
  8. 양환석, "협력기반 인증 기법을 통한 라우팅 성능 개선에 관한 연구," 디지털산업정보학회 논문지, 제 12권, 제 1호, 2016, pp. 71-79.
  9. 정병호, "기밀정보 유출 경험을 가진 기업들의 정보사고 대응역량 강화에 관한 연구," 디지털산업정보학회 논문지, 제 12권, 제 2호, 2016, pp. 73-86.