DOI QR코드

DOI QR Code

Reliability Analysis and Fault Tolerance Strategy of TMR Real-time Control Systems

TMR 실시간 제어 시스템의 내고장성 기법 및 신뢰도 해석


Abstract

In this paper, we propose the Triple Modular Redundancy (TMR) control system equipped with a checkpoint strategy. In this system, faults in a single processor are masked and faults in two or more processors are detected at each checkpoint time. When faults are detected, the rollback recovery is activated to recover from faults. The conventional TMR control system cannot overcome faults in two or more processors. The proposed system can effectively cope with correlated and independent faults in two or more processors. We develop a reliability model for this TMR control system under correlated and independent transient faults, and derive the reliability equation. Then we investigate the number of checkpoints that maximizes the reliability.

Keywords

References

  1. H. Kim and K. G. Shin, 'Design and Analysis of an Optimal Instruction Retry Policy for TMR Controller Computers', IEEE Tr. Computers, vol. 45, pp. 1217-1225, 1996. 11 https://doi.org/10.1109/12.544478
  2. C. M. Krishna and A. D. Singh, 'Optimal Configuration of Redundant Real-Time Systems in the Face of Correlated Failure', IEEE Tr. Reliability, vol. 44, pp. 587-594, 1995. 12 https://doi.org/10.1109/24.475977
  3. M. Kameyama and T. Higuchi, 'Design of Dependent-Failure-Tolerant Microcomputer System Using Treple-Modular Redundancy', IEEE Tr. Computers, vol. C-29, pp. 202-205, 1980. 2 https://doi.org/10.1109/TC.1980.1675545
  4. H. Kim, K. G. Shin, 'Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMP Controller Computers', IEEE Tr. Computers, vol. 45, pp. 1331-1337, 1996. 11 https://doi.org/10.1109/12.544492
  5. Y.-H. Lee and K. G.. Shin, 'Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery Blocks', EEE Tr. Computers, vol. C-33, pp. 113-124, 1984. 2 https://doi.org/10.1109/TC.1984.1676403
  6. Krishna and A. D. Singh, 'Reliability of Checkpointed Real-Time Systems Using Time Redundancy', IEEE Tr. Reliability, Vol. 42, pp. 427-435, 1993. 9 https://doi.org/10.1109/24.257826
  7. R. Geist, R. Reynolds, and J. Westall, 'Selection of a Checkpoint Interval in a Critical-Task Environment', IEEE Tr. Reliability, vol. 37, pp. 395-400, 1988. 10 https://doi.org/10.1109/24.9847
  8. K. G. Shin, T.-H. Lin, and Y.-H. Lee, 'Optimal Checkpointing of Real-Time Tasks', IEEE Tr. Computers, vol. C-36, pp. 1328-1341, 1987. 11 https://doi.org/10.1109/TC.1987.5009472
  9. A. Ziv and J. Bruck, 'An On-Line Algorithm for Checkpoint Placement', IEEE Tr. Computers, vol. 46, pp. 976-984, 1997. 9 https://doi.org/10.1109/12.620479
  10. J. W. Young, 'A First Order Approximation to the Optimal Checkpoint Intervals', Comm. of the ACM, vol. 17, pp. 530-531, 1974. 11 https://doi.org/10.1145/361147.361115
  11. E. Gelenbe, D. Derochette, 'Performance of Rollback Recovery Systems under Intermittent Failures', Comm. of the ACM, vol. 21, pp. 493-499, 1978. 6 https://doi.org/10.1145/359511.359531
  12. S. W. Kwak, B. J. Choi and B. K. Kim, 'Optimal Checkpointing Strategy for Real-Time Control Systems under Faults with Exponential Duration', IEEE Tr. Reliability, vol. 50, no. 3, pp. 293-301, Sep. 2001 https://doi.org/10.1109/24.974127
  13. S. W. Kwak and B. K. Kim, 'Task Scheduling Strategies for Reliable TMR Controllers using Task Grouping and Assignment', IEEE Tr. Reliability, vol. 49, no. 4, pp. 355-362, Dec. 2000 https://doi.org/10.1109/24.922488

Cited by

  1. A New Hardening Technique Against Radiation Faults in Asynchronous Digital Circuits Using Double Modular Redundancy vol.20, pp.6, 2014, https://doi.org/10.5302/J.ICROS.2014.14.0006