• Title/Summary/Keyword: Checkpointing and Recovery

Search Result 30, Processing Time 0.022 seconds

Replicated Chaeckpointing Failure Recovery Schemes for Mobile Hosts and Mobile Support Station in Cellular Networks (셀룰라 네트워크 환경에서의 이중화 체크포인팅을 이용한 이동 호스트 및 기지국 결함 복구 기법)

  • Byun, Kyue-Sub;Kim, Jai-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1B
    • /
    • pp.13-23
    • /
    • 2002
  • A mobile host is prone to failure due to lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life on the wireless network. Many researchers have studied to overcome these problems. For high level Availability in the cellular networks, it is necessary to consider recovery from the failures of mobile support stations as well as mobile as mobile hosts. In this paper, we present modified trickle scheme for recovery from failures of Mobile Support Station based on checkpointing scheme and analyze and compare the performance. We propose and analyze the performance of two schemes : one is waiting recovery scheme for the mobile support station having the last checkpoint and the other is searching the new path to the another mobile support station having the checkpoint.

Lazy Garbage Collection of Coordinated Checkpointing Protocol for Avoiding Sympathetic Rollback (동기적 검사점 기법에서 불필요한 복귀를 회피하기 위한 쓰레기 처리 기법)

  • Chung, Kwang-Sik;Yu, Heon-Chang;Lee, Won-Gyu;Lee, Seong-Hoon;Hwang, Chong-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.6
    • /
    • pp.331-339
    • /
    • 2002
  • This paper presents a garbage collection protocol for checkpoints and message logs which are staved on the stable storage or volatile storage for fault tolerancy. The previous works of garbage collections in coordinated checkpointing protocol delete all the checkpoints except for the last checkpoints on earth processes. But implemented in top of reliable communication protocol like as TCP/IP, rollback recovery protocol based on only last checkpoints makes sympathetic rollback. We show that the old checkpoints or message logs except for the last checkpoints have to be preserved in order to replay the lost message. And we define the conditions for garbage collection of checkpoints and message logs for lost messages and present the garbage collection algorithm for checkpoints and message logs in coordinated checkpointing protocol. Since the proposed algorithm uses process information for lost message piggybacked with messages, the additional messages for garbage collection is not required The proposed garbage collection algorithm makes 'the lazy garbage collectioneffect', because relying on the piggybacked checked checkpoint information in send/receive message. But 'the lazy garbage collection effect'does not break the consistency of the whole systems.

An Error Detection and Recovery System based on Multimedia Computer Supported Cooperative Work (멀티미이어 협동 작업환경에서의 오류 감지 및 복구 시스템)

  • Ko, Eung-Nam;Hwang, Dae-Joon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5
    • /
    • pp.1330-1340
    • /
    • 2000
  • Multimedia isn ow applied to various real world areas. In particular, the focus on multimedia system and CSCW(Computer Supported Cooperative Work) has increased. In spite of this current trend, however, the study of fault tolerance for CSCW has not yet fully progressed. We propose EDR_MSCW. It is a system that is suitable for detecting ad recovering software error based on multimedia computer supported cooperative work as DOORAE by using software techniques. DOORAE is a framwork for supporting development on multimedia applications for computer-based collaborative works. When an error occurs, EDR_MCSCW detects an error by using hooking methods in MS-Windows API(Application Program Interface) function. If an error is found, we present a checkpointing and recovery algorithm which has the removal function of the domino-effect for recovering multimedia and CSCW by using stack.

  • PDF

Efficient Process Checkpointing through Fine-Grained COW Management in New Memory based Systems (뉴메모리 기반 시스템에서 세밀한 COW 관리 기법을 통한 효율적 프로세스 체크포인팅 기법)

  • Park, Jay H.;Moon, Young Je;Noh, Sam H.
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.132-138
    • /
    • 2017
  • We design and implement a process-based fault recovery system to increase the reliability of new memory based computer systems. A rollback point is made at every context switch to which a process can rollback to upon a fault. In this study, a clone process of the original process, which we refer to as a P-process (Persistent-process), is created as a rollback point. Such a design minimizes losses when a fault does occur. Specifically, first, execution loss can be minimized as rollback points are created only at context switches, which bounds the lost execution. Second, as we make use of the COW (Copy-On-Write)mechanism, only those parts of the process memory state that are modified (in page units) are copied decreasing the overhead for creating the P-process. Our experimental results show that the overhead is approximately 5% in 8 out of 11 PARSEC benchmark workloads when P-process is created at every context switch time. Even for workloads that result in considerable overhead, we show that this overhead can be reduced by increasing the P-process generation interval.

Enhancing Dependability of Systems by Exploiting Storage Class Memory (스토리지 클래스 메모리를 활용한 시스템의 신뢰성 향상)

  • Kim, Hyo-Jeen;Noh, Sam-H.
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.19-26
    • /
    • 2010
  • In this paper, we adopt Storage Class Memory, which is next-generation non-volatile RAM technology, as part of main memory parallel to DRAM, and exploit the SCM+DRAM main memory system from the dependability perspective. Our system provides instant system on/off without bootstrapping, dynamic selection of process persistence or non-persistence, and fast recovery from power and/or software failure. The advantages of our system are that it does not cause the problems of checkpointing, i.e., heavy overhead and recovery delay. Furthermore, as the system enables full application transparency, our system is easily applicable to real-world environments. As proof of the concept, we implemented a system based on a commodity Linux kernel 2.6.21 operating system. We verify that the persistence enabled processes continue to execute instantly at system off-on without any state and/or data loss. Therefore, we conclude that our system can improve availability and reliability.

Design and Analysis of Fault-Tolerant Object Group Framework for Effective Object Management and Load Distribution (효율적 객체 관리 및 부하 분산을 위한 고장포용 객체그룹 프레임워크 설계)

  • Kang, Myung-Seok;Jung, Jae-Yun;Kim, Hag-Bae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.1B
    • /
    • pp.22-30
    • /
    • 2007
  • In this paper, to achieve consistency maintenance as well as stable service execution, we build a Fault-Tolerant Object Group framework that provides both of the group management service and the load scheduling service. The group management service supports the object management such as registration and authentication, and provides two schemes for failure recovery using the service priority and the checkpointing. In the load scheduling servile, we improve the effectiveness of service execution through the reasoning process of object loads based on the ANFIS architecture. The effectiveness in the performance of the developed framework is validated through a virtual home-network simulation based on the FTOG framework.

A Fault-tolerant Scheme for Clustering Routing Protocols (클러스터 기반 라우팅 프로토콜을 위한 결함허용기법)

  • Min, Hong;Kim, Bong-Jae;Jung, Jin-Man;Kim, Seuk-Hyun;Yoon, Jin-Hyuk;Cho, Yoo-Kun;Heo, Jun-Young;Yi, Sang-Ho;Hong, Ji-Man
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.668-672
    • /
    • 2010
  • In wireless sensor networks, a fault-tolerant scheme that detects the failure of sensor nodes and improves the reliability of collected information must be considered. Resource-constraint sensor nodes expose vulnerability and cannot use existing checkpointing schemes that do not consider a feature of sensor networks. In this paper, we propose a fault-tolerant scheme for clustering routing protocols that support the recovery of a head node.

An Implementation of Fault Tolerant Software Distributed Shared Memory with Remote Logging (원격 로깅 기법을 이용하는 고장 허용 소프트웨어 분산공유메모리 시스템의 구현)

  • 박소연;김영재;맹승렬
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.5_6
    • /
    • pp.328-334
    • /
    • 2004
  • Recently, Software DSMs continue to improve its performance and scalability As Software DSMs become attractive on larger clusters, the focus of attention is likely to move toward improving the reliability of a system. A popular approach to tolerate failures is message logging with checkpointing, and so many log-based rollback recovery schemes have been proposed. In this work, we propose a remote logging scheme which uses the volatile memory of a remote node assigned to each node. As our remote logging does not incur frequent disk accesses during failure-free execution, its logging overhead is not significant especially over high-speed communication network. The remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSMs grow much higher. We have designed and implemented the FT-KDSM(Fault Tolerant KAIST DSM) with the remote logging and showed the logging overhead and the recovery time.

Garbage Collection Protocol of Fault Tolerance Information in Multi-agent Environments (멀티에이전트 환경에서 결함 포용 정보의 쓰레기 처리 기법)

  • 이대원;정광식;이화민;신상철;이영준;유헌창;이원규
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.3_4
    • /
    • pp.204-212
    • /
    • 2004
  • Existing distributed systems have higher probability of failures occurrence than stand-alone system, so many fault tolerant techniques have been developed. Because of insufficient storage resulting from the increased fault tolerance information stored, the performance of system has been degraded. To avoid performance degradation, it needs delete useless fault tolerance information. In this paper, we propose a garbage collection algorithm for fault tolerance information. And we define and design the garbage collection agent for garbage collection of fault tolerance information, the information agent for management of fault tolerant data, and the facilitator agent for communication between agents. Also, we propose the garbage collection algorithm using the garbage collection agent. For rollback recovery, we use independent checkpointing protocol and sender based pessimistic message logging protocol. In our proposed garbage collection algorithm, the garbage collection, information, and facilitator agent is created with process, and the information agent constructs domain knowledge with its checkpoints and non-determistic events. And the garbage collection agent decides garbage collection time, and it deletes useless fault tolerance information in cooperation with the information and facilitator agent. For propriety of proposed garbage collection technique using agents, we compare domain knowledge of system that performs garbage collection after rollback recovery and domain knowledge of system that doesn't perform garbage collection.

The Hybrid Fault Tolerant Technique for Embedded System (임베디드 시스템을 위한 복합 결함 허용 기법)

  • Kook, Joong-Jin;Hong, Ji-Man
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06b
    • /
    • pp.273-278
    • /
    • 2007
  • 검사점 및 복구 도구(Checkpointing & Recovery Facility)를 이용하여 임베디드 시스템에서 결함 허용(Fault Tolerance) 기법을 적용할 경우 쓰기 작업의 오버헤드로 인해 실용성이 크게 떨어지게 된다. 실시간 운영체제와 함께 어떠한 한계 상황에서 결함 허용 및 복구 도구가 오히려 시스템의 성능을 저하시키는 요인으로 작용하게 되면 이는 결국 쓸모없는 도구가 되어 사용되지 않을 것이다. 따라서 프로세스의 복구를 위해 저장하는 프로세스 이미지의 기록에 소요되는 시간을 크게 낮추어야만 비로소 검사점 도구가 그 진가를 발휘하게 될 수 있다. 본 논문에서는 NVSRAM(Non Volatile SRAM)을 검사점 및 복구 도구의 저장 장치로 활용함으로써 기존의 검사점 도구에서 성능을 저하시키는 주원인이었던 검사점 기록의 오버헤드를 개선하기 위한 연구를 수행하였다. 검사점 기록 시간을 줄이기 위한 방법으로 주 메모리에 저장된 프로세스의 복구와 관련된 데이터를 SRAM 특성을 갖는 비휘발성 저장 장치인 NVSRAM에 저장하여 디스크 접근에 소요되는 시간을 최소화시킴으로써 임베디드 시스템에서 실용적으로 사용 가능한 검사점 도구를 구현하였고, 이러한 연구의 결과를 검증하기 위해 기존 시스템에서 저장 장치로 사용되던 플래시 메모리, 주 메모리, 원격 메모리를 사용하는 경우의 성능과 NVSRAM을 활용할 때의 성능을 비교해 보았다. 본 연구에서 제안하는 결함 허용 도구는 실제 시스템에 적용하여 효과적인 성능을 발휘할 수 있을 것이며, 차세대 메모리를 이용한 결함 허용 도구의 연구에 기여를 할 수 있을 것으로 기대된다.ate첨가배지(添加培地)에서 가장 저조(低調)하였다. vitamin중(中)에서는 niacin과 thiamine첨가배지(添加培地)에서 근소(僅少)한 증가(增加)를 나타내었다.소시켜 항이뇨 및 Na 배설 감소를 초래하는 작용과, 둘째는 신경 경로를 통하지 않고, 아마도 humoral factor를 통하여 신세뇨관에서 Na 재흡수를 억제하는 작용이 복합적으로 나타내는 것을 알 수 있었다.으로 초래되는 복합적인 기전으로 추정되었다., 소형과와 기형과는 S-3에서 많이 나왔다. 이상 연구결과에서 입도분포가 1.2-5mm인 것이 바람직한 것으로 나타났다.omopolysaccharides로 확인되었다. EPS 생성량이 가장 좋은 Leu. kimchii GJ2의 평균 분자량은 360,606 Da이었으며, 나머지 두 균주에 대해서는 생성 EPS 형태와 점도의 차이로 미루어 보아 생성 EPS의 분자구조와 분자량이 서로 다른 것으로 판단하였다.TEX>개로 통계학적으로 유의한 차이가 없었다. Heat shock protein-70 (HSP70)과 neuronal nitric oxide synthase (nNOS)에 대한 면역조직화학검사에서 실험군 Cs2군의 신경세포가 대조군 12군에 비해 HSP70과 nNOS의 과발현을 보였으며, 이는 통계학적으로 유의한 차이를 보였다(p<0.05). nNOS와 HSP70의 발현은 강한 연관성을 보였고(상관계수 0.91, p=0.000), nNOS를 발현하는 세포가 동시에 HSP70도 발현함을 확인할 수 있었다. 결론: 우리는 cyclosporin A가 토끼의 25분간의 척수허혈에 대해 척수보호 효과가 있었으며 이는 HSP70의

  • PDF