• Title/Summary/Keyword: Software fault-tolerance

Search Result 89, Processing Time 0.028 seconds

A study on the Design Techniques and Analysis of Fault-Tolerant Computers

  • Cho, Jai-Rip
    • Journal of Korean Society for Quality Management
    • /
    • v.21 no.1
    • /
    • pp.78-95
    • /
    • 1993
  • The art of designing and analyzing fault-tolerant computers is surveyed with special emphasis on problems of analyzing the behavior of computers that have autonomous repair capability. The survey covers the following topics : (1) general issues in computer reliability, (2) fault-tolerance state relations and requirements, (3) computational hierarchy, (4) fault characteristics, (5) fault diagnosis, (6) fault-tolerance schemes for logic network and machines, (7) fault-coverage effects, and (8) fault-tree analysis of coverage. This paper does not include techniques for verifying nonredundant hardware or system software designs or for verifying the correctness of application programs.

  • PDF

A Method for Improving Interface Fault Tolerance in the Embedded Software (임베디드 소프트웨어의 인터페이스 결함허용성 향상 기법)

  • Choi, In Hwa;Paik, Jong Ho;Hwang, Jun
    • Journal of Internet Computing and Services
    • /
    • v.14 no.1
    • /
    • pp.31-39
    • /
    • 2013
  • Generally, there can be a interface discrepancy between the legacy hardware and the new software in combining new software component with reused hardware components in the embedded system. This kind of the interface discrepancy may cause various types of faults and also result in declining interface fault tolerance. In this paper we propose a method to improve interface fault tolerance. First of all, the new interface discrepancy fault type which has not been dealt with before is to be defined and next the testing method for generating test paths is proposed by considering the new defined interface discrepancy fault type in this paper. Several tests show that the proposed method detects more fatal faults about 7.9% in comparison with the existing testing method for commercial broadcasting receiver. Since the proposed method can provide software developers with test paths to be available earlier on the software development cycle, in addition, software developers can regard on interface discrepancy fault in advance. Consequently, more efficient test planning can be established to improve the interface fault tolerance.

A Study on Fault-Tolerance Design Methods for Nuclear Digital Control Systems (원전 디지털 제어계통을 위한 고장허용설계방법론에 관한 연구)

  • Go, Won-Seok;Choe, Jung-In
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.49 no.1
    • /
    • pp.1-9
    • /
    • 2000
  • In this paper, a design method of fault-tolerance is presented for the nuclear digital control systems composed of software and hardware. As a quantitative design method measure of fault-tolerance, we used Reliability, Availability and Safety. To implement the proposed fault-tolerance, a prototype system has been devised for the digital control systems and a quantitative method of 'Markovian Model' is applied. The results provide the appropriate degree of redundancy and diversity, and fail-safe.

  • PDF

Comparative Study of the System Operational Method for Fault-Tolernace (Fault-Tolerance를 위한 시스템의 동작방식에 대한 비교 연구)

  • 양성현;이기서
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.11
    • /
    • pp.1279-1289
    • /
    • 1992
  • Fault-tolerant system in improved the reliability and safety by using hardware and software redundancy. Fault mask and detection, identification techniques are conditionally used with system's application areas. Here DMR system is operated with standby and fail-safe module method that has minimal hardware and software redundancy, then its reliablity and safety comparison is presented respectively. Also this paper proposed an effective methods of dealing with transient faults as compared system's MTTFs to transient faults tolerance capabilities of self-diagnosis program.

  • PDF

Design of a Fault-tolerant Embedded Controllerfor Rail-way Signaling Systems

  • Cho, Yong-Gee;Lim, Jae-Sik
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.68.4-68
    • /
    • 2002
  • $\textbullet$ This report presents an implementation a set of reusable software components which use of fault-tolerance embedded controller for railway signalling systems. These components can be used in real-time applications without application reprogramming. $\textbullet$ This library runs under VxWorks operating system and is oriented on real-time embedded systems. The library includes fault detection, fault containment, checkpointing and recovery components. $\textbullet$ The library enables to support high-speed response to fault occurrence in application software. Garbage collector together with VxWorks Watchdog provides both dead tasks detection and useless resources removing to avoid an overflow. Control flow...

  • PDF

Switchover Time Analysis of Primary-Backup Server Systems Based on Software Rejuvenation (소프트웨어 재활기법에 기반한 주-여분 서버 시스템의 작업전이 시간 분석)

  • Lee, Jae-Sung;Park, Kie-Jin;Kim, Sung-Soo
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.81-90
    • /
    • 2001
  • As the rapid growth of Internet, computer systems are growing in its size and complexity. To meet high availability requirements for the systems, one usually uses both hardware and software fault tolerance techniques. To prevent failures of computer systems from software-aging phenomenon that come from long mission time, we adopt software rejuvenation method that stops and restarts the software in the servers intentionally. The method makes the systems clean and healthy state in which the probability of fault occurrence is very low. In this paper, we study how switchover time affects software rejuvenation of primary-backup server systems. Through experiments, we find that switchover time is an essential factor for deciding the rejuvenation policy.

  • PDF

SSR (Simple Sector Remapper) the fault tolerant FTL algorithm for NAND flash memory

  • Lee, Gui-Young;Kim, Bumsoo;Kim, Shin-han;Byungsoo Jung
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.932-935
    • /
    • 2002
  • In this paper, we introduce new FTL(Flash Translation Layer) driver algorithm that tolerate the power off errors. FTL driver is the software that provide the block device interface to the upper layer software such as file systems or application programs that using the flash memory as a block device interfaced storage. Usually, the flash memory is used as the storage devices of the mobile system due to its low power consumption and small form factor. In mobile system, the state of the power supplement is not stable, because it using the small sized battery that has limited capacity. So, a sudden power off failure can be occurred when we read or write the data on the flash memory. During the write operation, power off failure may introduce the incomplete write operation. Incomplete write operation denotes the inconsistency of the data in flash memory. To provide the stable storage facility with flash memory in mobile system, FTL should provide the fault tolerance against the power off failure. SSR (Simple Sector Remapper) is a fault tolerant FTL driver that provides block device interface and also provides tolerance against power off errors.

  • PDF

Linux-based ARINC 653 Health Monitor (리눅스 기반 ARINC 653 헬스 모니터)

  • Yoon, Young-Il;Joe, Hyunwoo;Kim, Hyungshin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.3
    • /
    • pp.183-191
    • /
    • 2014
  • The software running on avionic system is required to be highly reliable and productive. The air transport industry has developed ARINC Specification 653(ARINC653) as a standardized software requirement of avionics computers. The document specifies the interface boundary between avionics application software and the core executive software. Dependability in ARINC 653 is provided by spatial and temporal partitioning whilst fault-tolerance is provided by health monitoring mechanism. Legacy real-time operating systems are used to support ARINC653 health monitor on integrated modular avionics(IMA). However, legacy real-time operating systems are costly and difficult to modify the kernel. In this paper, we suggest a Linux-based ARINC653 health monitor. Functionalities to support ARINC653 health monitor are implemented as a Linux kernel module and its performance is evaluated.

Fault Tolerance Design for Servo Manipulator System Operating in a Hot Cell

  • Jin, Jae-Hyun;Ahn, Sung-Ho;Park, Byung-Suk;Yoon, Ji-Sup;Jung, Jae-Hoo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.2467-2470
    • /
    • 2003
  • In this paper, fault tolerant mechanisms are presented for a servo manipulator system designed to operate in a hot cell. A hot cell is a sealed and shielded room to handle radioactive materials, and it is dangerous for people to work in the hot cell. So, remote operations are necessary to handle radioactive materials in the hot cell. KAERI has developed a servo manipulator system to perform such remote operations. However, since electric components such as servo motors are weak to radiations, fault tolerant mechanisms have to be considered. For fault tolerance of the servo manipulator system, hardware and software redundancy have been considered. In case of hardware, radioactive resistant electric components such as cables and connectors have been adopted and motors driving a transport have been duplicated. In case of software, a reconfiguration algorithm accommodating one motor's failure has been developed. The algorithm uses redundant axis to recover the end effector's motion in spite of one motor's failure.

  • PDF

Fault Tolerance System running on Distributed Multimedia (분산 멀티미디어에서의 결함 허용 시스템)

  • Hong, Sung-Ryong;Ko, Eung-Nam
    • Journal of Digital Contents Society
    • /
    • v.16 no.1
    • /
    • pp.123-126
    • /
    • 2015
  • This paper described fault tolerance system running on distributed multimedia. We implemented the error manager service so that the users participated in distribute multimedia collaborative work may refer synchronized error objects as the same view to others. distributed multimedia environment are based on IP-USN(Internet Protocol - Ubiquitous Sensor Network) and M2M(Machine to machine). This is a system that is suitable for detecting, sharing and recovering software error in distribute multimedia CSCW(Computer Supportes Cooperated Work) environment. With error synchronization system, a group cooperating users can synchronize error applications.