• Title/Summary/Keyword: Checkpointing and Recovery

Search Result 30, Processing Time 0.024 seconds

Recoverable Distributed shared Memory Systems Using Object-Oriented Dependency Tracking and Checkpointing (객체지향 종속 추적 및 체크포인팅(checkpointing)을 이용한 복구 가능한 분산 공유 메모리 시스템)

  • Kim, Jae-Hun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.476-484
    • /
    • 1999
  • Many message logging and checkpointing schemes are proposed for fault tolerance in distributed systems in which nodes communicate by message passing. Most researches for recoverable distributed shared memory (DSM) also adopt similar schemes used in message passing systems. However, schemes used in message passing systems are not always appropriate to be directly used in DSM systems because the two systems, message passing systems and DSM systems, have different natures (function shipping and data shipping). Many modified schemes have been proposed for DSM systems to resolve these differences. In this paper, an object oriented approach is proposed for recoverable DSM. We present a new dependency tracking scheme between pages instead of processes. Based on this scheme, we propose new checkpointing and recovery schemes that can reduce overhead to make DSM recoverable.

  • PDF

Page-level Incremental Checkpointing for Efficient Use of Stable Storage (안정 저장장치의 효율적 사용을 위한 페이지 기반 점진적 검사점 기법)

  • Heo, Jun-Young;Yi, Sang-Ho;Gu, Bon-Cheol;Cho, Yoo-Kun;Hong, Ji-Man
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.12
    • /
    • pp.610-617
    • /
    • 2007
  • Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. However, the cumulative site of incremental checkpoints increases at a steady rate over time because a number of updated values may be saved for the same page. In this paper, we present a comprehensive overview of Pickpt, a page-level incremental checkpointing facility. Pickpt provides space-efficient techniques aiming to minimizing the use of disk space. For our experiments, the results showed that the use of disk space using Pickpt was significantly reduced, compared with existing incremental checkpointing.

An Efficient Checkpointing Method for Mobile Hosts via the Software Agent (이동 기기에 적합한 소프트웨어 에이전트 기반의 효율적 체크포인팅 기법)

  • Lim, Sung-Chae
    • The KIPS Transactions:PartA
    • /
    • v.15A no.2
    • /
    • pp.111-118
    • /
    • 2008
  • With the advance in mobile communication systems, the need for distributed applications running on multiple mobile devices also grows gradually. As such applications are subject to H/W failures of the mobile device or communication disruptions, compared to the traditional applications in fixed networks, it is crucial to develop any recovery mechanism suitable for them. For this, checkpointing is widely used to restart interrupted applications. In this paper, we devise an efficient checkpointing method that adopts the software agent executed at the mobile support station. The agent, called the checkpointing agent, is aimed at supporting the concept of rollback-distance (R-distance) that bounds the maximum number of roll-backed local checkpoints. By means of the R-distance, our method can prevent undesirable domino effects and heavy checkpoint overhead, while providing high flexibility in checkpoint creation.

Design and Implementation of a User-based MPI Checkpointer for Portability (이식성을 고려한 사용자기반 MPI 체크포인터의 설계 및 구현)

  • Ahn Sun-Il;Han Sang-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.35-43
    • /
    • 2006
  • An MPI Checkpointer is a tool which provides fault-tolerance through checkpointing The previous researches related to the MPI checkpointer have focused on automatic checkpointing and recovery capabilities, but they haven't considered portability issues. In this paper, we discuss design and implementation issues considered for portability when we developed an MPI checkpointer called STFT. In order to increase portability, firstly STFT supports the abstraction interface for a single process checkpointer. Secondly, STFT uses a user-based checkpointing method, and limits possible checkpointing places a user can make. Thirdly, STFT lets the MPI_Init create network connections to the other MPI processes in a fixed order. With these features, we expect STFT can be easily adaptable to various platforms and MPI implementations, and confirmed STFT is easily adaptable to LAM and MPICH/P4 with the prototype Implementation.

Design of Main-Memory Database Prototype System using Fuzzy Checkpoint Technique in Real-Time Environment (실시간 시스템에서 퍼지 검사점을 이용한 주기억 데이터베이스 프로토타입 시스템의설계)

  • Park, Yong-Mun;Lee, Chan-Seop;Choe, Ui-In
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.6
    • /
    • pp.1753-1765
    • /
    • 2000
  • As the areas of computer application are expanded, real-time application environments that must process as many transactions as possible within their deadlines, such as a stock transaction systems, ATM switching systems etc, have been increased recently. The reason why the conventional database systems can't process soft real-time applications is the lack of prediction and poor performance on processing transaction's deadline. If transactions want to access data stored at the secondary storage, they can not satisfy requirements of real-time applications because of the disk delay time. This paper designs a main-memory database prototype systems to be suitable to real-time applications and then this system can produce rapid results without disk i/o as all of the information are loaded in main memory database. In thesis proposed the improved techniques with respect to logging, checkpointing, and recovering in our environment. In order to improve the performance of the system, a) the frequency of log analysis and redo processing is reduced by the proposed redo technique at system failure, b) database consistency is maintained by improved fuzzy checkpointing. The performance model is proposed which consists of two parts. The first part evaluates log processing time for recovery and compares with other research activities. The second part examines checkpointing behavior.

  • PDF

A Recovery Scheme of a Cluster Head Failure for Underwater Wireless Sensor Networks (수중 무선 센서 네트워크를 위한 클러스터 헤드 오류 복구 기법)

  • Heo, Jun-Young;Min, Hong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.17-22
    • /
    • 2011
  • The underwater environments are quite different from the terrestrial ones in terms of the communication channel and constrains. In underwater wireless sensor network, the probability of node failure is high because sensor nodes are deployed in more harsh environments than the ground based networks and moved by waves and currents. There are researches considering the communication environments of underwater to improve the data transmission throughput. In this paper, we present a checkpointing scheme of the cluster heads that recoveries from a cluster head failure quickly. Experimental results show that the proposed scheme enhances the reliability of the networks and more efficient in terms of the energy consumption and the recovery latency than without checkpointing.

Power-aware Real-time Task Scheduling in Dependable Embedded Systems (신뢰도를 요구하는 임베디드 시스템에서의 저전력 태스크 스케쥴링)

  • Kim, Kyong Hoon;Kim, Yuna;Kim, Jong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.3 no.1
    • /
    • pp.25-29
    • /
    • 2008
  • In this paper, we provide an adaptive power-aware checkpointing scheme for fixed priority-based DVS scheduling in dependable real-time systems. In the provided scheme, we analyze the minimum number of tolerable faults of a task and the optimal checkpointing interval in order to meet the deadline and guarantee its specified reliability. The energy-efficient voltage level at a fault arrival is also analyzed and used in the recovery of the faulty task.

  • PDF

Design of a Fault-tolerant Embedded Controllerfor Rail-way Signaling Systems

  • Cho, Yong-Gee;Lim, Jae-Sik
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.68.4-68
    • /
    • 2002
  • $\textbullet$ This report presents an implementation a set of reusable software components which use of fault-tolerance embedded controller for railway signalling systems. These components can be used in real-time applications without application reprogramming. $\textbullet$ This library runs under VxWorks operating system and is oriented on real-time embedded systems. The library includes fault detection, fault containment, checkpointing and recovery components. $\textbullet$ The library enables to support high-speed response to fault occurrence in application software. Garbage collector together with VxWorks Watchdog provides both dead tasks detection and useless resources removing to avoid an overflow. Control flow...

  • PDF

An Efficient Merging Algorithm for Recovery and Garbage Collection in Incremental Checkpointing (점진적 검사점에서 복구와 쓰레기 수집을 위한 효율적인 병합 알고리즘)

  • 허준영;이상호;조유근;홍지만
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04a
    • /
    • pp.151-153
    • /
    • 2004
  • 점진적 검사점은 페이지 쓰기 보호를 사용하여 검사점에서 변경된 페이지만을 저장한다. 점진적 검사점을 사용하면 검사점 오버헤드가 줄어드는 반면에 프로세스의 메모리 페이지들이 여러 검사점에 걸쳐있기 때문에 오래된 검사점들을 병합하거나 지울 수 없다. 본 논문에서는 점진적 검사점에서 복구와 쓰레기 수집을 위한 효율적인 병합 알고리즘을 제안한다. 제안한 알고리즘으로 점진적 검사점들을 병합하여 복구를 위한 완전 검사점을 만들고 불필요한 검사점들을 지울 수 있다.

  • PDF

Consistency preservation techniques for Location Register System in Mobile Networks

  • Kim, Jang-Hwan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.144-149
    • /
    • 2020
  • A database called Home Location Register(HLR) plays a major role in location management in mobile cellular networks. The objectives of this paper are to identify the problems of the current HLR system through rigorous analysis, to suggest solutions to them. The current HLR backup method is a process of simply writing the changed memory SLD block to disk, which has a problem in maintaining database consistency. Since information change and backup are performed separately by separate processes, there is a risk of information inconsistency when an error restart occurs. To solve this problem, a transaction concept was introduced for subscriber-related operation functions and a recovery method through logging and checkpointing was introduced. The subscriber related functions of tasks terminated normally by the suggested process are recovered with consistency even after system restarts. Performance is also not affected seriously because disk tasks for log occur with only subscriber related functions.