• Title/Summary/Keyword: fault tolerance information

Search Result 328, Processing Time 0.029 seconds

Fault-Free Process for IT System with TRM(Technical Reference Model) based Fault Check Point and Event Rule Engine (기술분류체계 기반의 장애 점검포인트와 이벤트 룰엔진을 적용한 무장애체계 구현)

  • Hyun, Byeong-Tag;Kim, Tae-Woo;Um, Chang-Sup;Seo, Jong-Hyen
    • Information Systems Review
    • /
    • v.12 no.3
    • /
    • pp.1-17
    • /
    • 2010
  • IT Systems based on Global Single Instance (GSI) can manage a corporation's internal information, resources and assets effectively and raise business efficiency through consolidation of their business process and productivity. But, It has also dangerous factor that IT system fault failure can cause a state of paralysis of a business itself, followed by huge loss of money. Many of studies have been conducted about fault-tolerance based on using redundant component. The concept of fault tolerance is rather simple but, designing and adopting fault-tolerance system is not easy due to uncertainty of a type and frequency of faults. So, Operational fault management that working after developed IT system is important more and more along with technical fault management. This study proposes the fault management process that including a pre-estimation method using TRM (Technical Reference Model) check point and event rule engine. And also proposes a effect of fault-free process through built fault management system to representative company of Hi-tech industry. After adopting fault-free process, a number of failure decreased by 46%, a failure time decreased by 56% and the Opportunity loss costs decreased by 77%.

A fault detection and recovery mechanism for the fault-tolerance of a Mini-MAP system (Mini-MAP 시스템의 결함 허용성을 위한 결함 감지 및 복구 기법)

  • Mun, Hong-Ju;Kwon, Wook-Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.4 no.2
    • /
    • pp.264-272
    • /
    • 1998
  • This paper proposes a fault detection and recovery mechanism for a fault-tolerant Mini-MAP system, and provides detailed techniques for its implementation. This paper considers the fault-tolerant Mini-MAP system which has dual layer structure from the LLC sublayer down to the physical layer to cope with the faults of those layers. For a good fault detection, a redundant and hierarchical fault supervision architecture is proposed and its implementation technique for a stable detection operation is provided. Information for the fault location is provided from data reported with a fault detection and obtained by an additional network diagnosis. The faults are recovered by the stand-by sparing method applied for a dual network composed of two equivalent networks. A network switch mechanism is proposed to achieve a reliable and stable network function. A fault-tolerant Mini-MAP system is implemented by applying the proposed fault detection and recovery mechanism.

  • PDF

Soft Fault Detection Using an Improved Mechanism in Wireless Sensor Networks

  • Montazeri, Mojtaba;Kiani, Rasoul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.10
    • /
    • pp.4774-4796
    • /
    • 2018
  • Wireless sensor networks are composed of a large number of inexpensive and tiny sensors used in different areas including military, industry, agriculture, space, and environment. Fault tolerance, which is considered a challenging task in these networks, is defined as the ability of the system to offer an appropriate level of functionality in the event of failures. The present study proposed an intelligent throughput descent and distributed energy-efficient mechanism in order to improve fault tolerance of the system against soft and permanent faults. This mechanism includes determining the intelligent neighborhood radius threshold, the intelligent neighborhood nodes number threshold, customizing the base paper algorithm for distributed systems, redefining the base paper scenarios for failure detection procedure to predict network behavior when running into soft and permanent faults, and some cases have been described for handling failure exception procedures. The experimental results from simulation indicate that the proposed mechanism was able to improve network throughput, fault detection accuracy, reliability, and network lifetime with respect to the base paper.

Comparative Study of the System Operational Method for Fault-Tolernace (Fault-Tolerance를 위한 시스템의 동작방식에 대한 비교 연구)

  • 양성현;이기서
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.11
    • /
    • pp.1279-1289
    • /
    • 1992
  • Fault-tolerant system in improved the reliability and safety by using hardware and software redundancy. Fault mask and detection, identification techniques are conditionally used with system's application areas. Here DMR system is operated with standby and fail-safe module method that has minimal hardware and software redundancy, then its reliablity and safety comparison is presented respectively. Also this paper proposed an effective methods of dealing with transient faults as compared system's MTTFs to transient faults tolerance capabilities of self-diagnosis program.

  • PDF

Simulation-Based Fault Analysis for Resilient System-On-Chip Design

  • Han, Chang Yeop;Jeong, Yeong Seob;Lee, Seung Eun
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.175-179
    • /
    • 2021
  • Enhancing the reliability of the system is important for recent system-on-chip (SoC) designs. This importance has led to studies on fault diagnosis and tolerance. Fault-injection (FI) techniques are widely used to measure the fault-tolerance capabilities of resilient systems. FI techniques suffer from limitations in relation to environmental conditions and system features. Moreover, a hardware-based FI can cause permanent damage to the target system, because the actual circuit cannot be restored. Accordingly, we propose a simulation-based FI framework based on the Verilog Procedural Interface for measuring the failure rates of SoCs caused by soft errors. We execute five benchmark programs using an ARM Cortex M0 processor and inject soft errors using the proposed framework. The experiment has a 95% confidence level with a ±2.53% error, and confirms the reliability and feasibility of using proposed framework for fault analysis in SoCs.

Optimal Software Release Using Time and Cost Benefits via Fuzzy Multi-Criteria and Fault Tolerance

  • Srivastava, Praveen Ranjan
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.21-54
    • /
    • 2012
  • As we know every software development process is pretty large and consists of different modules. This raises the idea of prioritizing different software modules so that important modules can be tested by preference. In the software testing process, it is not possible to test each and every module regressively, which is due to time and cost constraints. To deal with these constraints, this paper proposes an approach that is based on the fuzzy multi-criteria approach for prioritizing several software modules and calculates optimal time and cost for software testing by using fuzzy logic and the fault tolerance approach.

Design of Fault-Tolerant Inductive Position Sensor (고장 허용 유도형 위치 센서 설계)

  • Paek, Sung-Kuk;Park, Byeong-Cheol;Noh, Myoung-Gyu D.
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.32 no.3
    • /
    • pp.232-239
    • /
    • 2008
  • The position sensors used in a magnetic bearing system are desirable to provide some degree of fault-tolerance as the rotor position is necessary for the feedback control to overcome the open-loop instability. In this paper, we propose an inductive position sensor that can cope with a partial fault in the sensor. The sensor has multiple poles which can be combined to sense the in-plane motion of the rotor. When a high-frequency voltage signal drives each pole of the sensor, the resulting current in the sensor coil contains information regarding the rotor position. The signal processing circuit of the sensor extracts this position information. In this paper, we used the magnetic circuit model of the sensor that shows the analytical relationship between the sensor output and the rotor motion. The multi-polar structure of the sensor makes it possible to introduce redundancy which can be exploited for fault-tolerant operation. The proposed sensor is applied to a magnetically levitated turbo-molecular vacuum pump. Experimental results validate the fault-tolerance algorithm.

An Efficient Implementation of Tornado Code for Fault Tolerance

  • Lei, Jian-Jun;Kwon, Gu-In
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.13-18
    • /
    • 2009
  • This paper presents the implementation procedure of encoding and decoding algorithms for Tornado code that can provide fault tolerance for storage and transmission system. The degree distribution satisfying heavy tail distribution is produced. Based on this distribution, a good random irregular bipartite graph is attained after plenty of trails. Such graph construction is proved to be efficient, and the experiments also demonstrate that the implementation obtains good performance in terms of decoding overhead.

  • PDF

A New Artificial Immune System Based on the Principle of Antibody Diversity And Antigen Presenting Cell (Antibody Diversity 원리와 Antigen Presenting Cell을 구현한 새로운 인공 면역 시스템)

  • 이상형;김은태;박민용
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.4
    • /
    • pp.51-58
    • /
    • 2004
  • This paper proposes a new artificial immune approach to on-line hardware test which is the most indispensable technique for fault tolerant hardware. A novel algorithm of generating tolerance conditions is suggested based on the principle of the antibody diversity. Tolerance conditions in artificial immune system correspond to the antibody in biological immune system. In addition, antigen presenting cell (APC) is realized by Quine-McCluskey method in this algorithm and tolerance conditions are generated through GA (Genetic Algorithm). The suggested method is applied to the on-line monitoring of a typical FSM (a decade counter) and its effectiveness is demonstrated by the computer simulation.

Fault Tolerant System Modeling based on Real-Time Object (실시간 객체 기반 결함허용 시스템 모델링)

  • Im, Hyeong-Taek;Yang, Seung-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2233-2244
    • /
    • 1999
  • It is essential to guarantee high reliability of embedded real-time systems since the failure of such systems may result in large financial damage or threaten human life. Though many researches have devoted to fault tolerant mechanisms, most of them are object-level fault tolerant mechanisms that can detect errors occurred in a single object and treat the errors in object-level. As embedded real-time systems become more complex and larger, there exist faults that cannot be detected by or tolerated with object-level fault tolerance. Hence, system-level fault tolerance is needed. System-level fault tolerance examines the status of a system whether the system is normal or not by analyzing the status of objects. When an error is detected it should be capable of locating the fault and performing an appropriate recovery and reconfiguration action. In this paper, we propose RobustRTO(Robust Real-Time Object) that provides object-level fault tolerance capability and RMO(Region Monitor real-time Object) that offers system-level fault tolerance capability. Then we show how highly dependable fault tolerant systems can be modeled by RobustRTO and RMO. The model is presented based on real-time objects.

  • PDF