• Title/Summary/Keyword: fault tolerance information

Search Result 328, Processing Time 0.023 seconds

A Super-Peer Coordination Scheme for Decentralized Peer-to-Peer Networking Using Mobile Agents

  • Chung, Won-Ho;Kang, Namhi
    • International journal of advanced smart convergence
    • /
    • v.4 no.2
    • /
    • pp.38-45
    • /
    • 2015
  • Peer-to-Peer(P2P) systems are generally classified into two categories; hybrid and pure P2P. Hybrid systems have a single central index server keeping the details of shared information, so that undesirable effects such as heavy load on that server and lack of fault-tolerance can be caused. Pure P2P causes the other problems such as message flooding and scalability although it shows high degree of fault-tolerance. Recently, mobile agent-based distributed computing has been receiving wide attention for its potential to support disconnected operations, high asynchrony, and thus saving network bandwidth. In this paper, a new scheme of peer coordination is proposed for a decentralized P2P network with self-organizing structure. We deployed mobile agents for incorporating the advantages of usage of mobile agents into our P2P network. Proposed P2P network has both advantages of hybrid and pure P2P. The problems of heavy load on the server and lack of fault-tolerance are improved by using multiple special peers called super-peers. And the problems of pure P2P can be reduced by using mobile agents.

A Fault-Tolerant Mobile Agent Framework and Replication Study for Internet Applications (인터넷 응용을 위한 고장 감내 이동 에이전트 프레임워크와 레플리케이션 연구)

  • Park, Kyeong-mo
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.701-708
    • /
    • 2003
  • This paper addresses the issue involved in dependability of distributed mobile agents in the Internet environment. We propose an architectural framework for the Internet applications making mobile agents into fault-tolerant. The replication of agents and data is of great importance to achieve fault tolerance in distributed systems over the Internet. This research focuses on the replication component for the proposed fault-tolerant mobile agent framework. We present and analyze the performance results obtained when doing simulation study on the effects of the degree of replication, the active and passive replication strategies, and the replication scale.

Design and Implementation of Adaptive Fault-Tolerant Management System over Grid (그리드 환경의 적응형 오류 극복 관리 시스템 설계 및 구현)

  • Kim, Eun-Kyung;Kim, Jeu-Young;Kim, Yoon-Hee
    • The KIPS Transactions:PartA
    • /
    • v.15A no.3
    • /
    • pp.151-154
    • /
    • 2008
  • A middleware in grid computing environment is required to support seamless on-demand services over diverse resource situations in order to meet various user requirements [1]. Since grid computing applications need situation-aware middleware services in this environment. In this paper, we propose a semantic middleware architecture to support dynamic software component reconfiguration based fault and service ontology to provide fault-tolerance in a grid computing environment. Our middleware includes autonomic management to detect faults, analyze causes of them, and plan semantically meaningful strategies to recover from the failure using pre-defined fault and service ontology trees. We implemented a referenced prototype, Web-service based Application Execution Environment(Wapee), as a proof-of-concept, and showed the efficiency in runtime recovery.

Service-Dependability-Case based Self-Adaptation in Service-Oriented Environment (서비스 지향 컴퓨팅 환경에서 서비스 안정성 케이스 기반 자가 적응 방법)

  • Jung, Changhee;Lee, Seok-Won
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1339-1348
    • /
    • 2015
  • In a distributed system environment based on a service-oriented architecture, separate systems collaborate to achieve the goals of the entire system by using services provided other systems. A service quality violation from using one service can cause runtime system failure in the environment. The existing self-adaptation methods follow fault tolerance mechanism that responds to a failure after a service quality violation. In other words, these methods are limited to responsive action. Therefore, a service-dependability-case based self-adaptation mechanism is necessary to preserve the dependability of the self-adaptive system. This paper demonstrates that the service-dependability-case based self-adaptation mechanism is better than QoS(quality of service)-based self-adaptation with fault tolerance to preserve the dependability of the self-adaptive system. Additionally, this paper suggests a method to present and analyze service dependability by using GSN(Goal Structuring Notation) which is the existing modeling method for the presentation of assurance cases, an action mechanism adapted using an analysis result of service-dependability-cases, a methods of leveraging the service-dependability-case based self-adaptation mechanism by following the service's life cycle, and the framework architecture including the major components and the interactions between the components in the control loop of the self-adaptation process.

An Algorithm For Load-Sharing and Fault-Tolerance In Internet-Based Clustering Systems (인터넷 기반 클러스터 시스템 환경에서 부하공유 및 결함허용 알고리즘)

  • Choi, In-Bok;Lee, Jae-Dong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.3
    • /
    • pp.215-224
    • /
    • 2003
  • Since there are various networks and heterogeneity of nodes in Internet, the existing load-sharing algorithms are hardly adapted for use in Internet-based clustering systems. Therefore, in Internet-based clustering systems, a load-sharing algorithm must consider various conditions such as heterogeneity of nodes, characteristics of a network and imbalance of load, and so on. This paper has proposed an expanded-WF algorithm which is based on a WF (Weighted Factoring) algorithm for load-sharing in Internet-based clustering systems. The proposed algorithm uses an adaptive granularity strategy for load-sharing and duplicate execution of partial job for fault-tolerance. For the simulation, the to matrix multiplication using PVM is performed on the heterogeneous clustering environment which consists of two different networks. Compared to other algorithms such as Send, GSS and Weighted Factoring, the proposed algorithm results in an improvement of performance by 55%, 63% and 20%, respectively. Also, this paper shows that It can process the fault-tolerance.

Design and Implementation of Reliable Distributed Programming Environment based on HORB (HORB에 기반한 신뢰성 있는 분산 프로그래밍 환경의 설계 및 구현)

  • Hyun, Mu-Yong;Kim, Shik;Kim, Myung-Jun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.2
    • /
    • pp.1-9
    • /
    • 2002
  • The use of Object-Oriented Distributed Programming(OODP) environment such as DCOM, DSOM, Java RMI, CORBA to implement distributed applications is becoming increasingly popular. However, absence of a fault-tolerance feature in these middleware platforms complicates the design and implementation of reliable distributed object-based applications, although they greatly enhance the quality and reusability of the distributed object-based applications. In this paper, we propose a fault-tolerant programming environment based on RMI, namely Evergreen, for the reliable distributed computing with checkpoints and rollback-recovery mechanism. Based on a series of experiments, we evaluate the performance of Evergreen and find its possibility of extension to fully support our optimal design goal.

Switchover Time Analysis of Primary-Backup Server Systems Based on Software Rejuvenation (소프트웨어 재활기법에 기반한 주-여분 서버 시스템의 작업전이 시간 분석)

  • Lee, Jae-Sung;Park, Kie-Jin;Kim, Sung-Soo
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.81-90
    • /
    • 2001
  • As the rapid growth of Internet, computer systems are growing in its size and complexity. To meet high availability requirements for the systems, one usually uses both hardware and software fault tolerance techniques. To prevent failures of computer systems from software-aging phenomenon that come from long mission time, we adopt software rejuvenation method that stops and restarts the software in the servers intentionally. The method makes the systems clean and healthy state in which the probability of fault occurrence is very low. In this paper, we study how switchover time affects software rejuvenation of primary-backup server systems. Through experiments, we find that switchover time is an essential factor for deciding the rejuvenation policy.

  • PDF

Implementation and Fault-tolerance Tests of Load Balanced and Duplicated Active-Active Web Servers (로드 밸런싱 Active-Active 방식의 웹 서버 이중화 구축 및 결함내성 시험)

  • Choi, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.1
    • /
    • pp.63-71
    • /
    • 2014
  • In this paper we researched on the Duplication Techniques for Active-Active Web Servers. Rsync and crontab utilities make copy periodically between web servers and maintain the same status. Load Balancing Server makes web servers load balanced and fast servicing by executing web servers alternatively. Even though one web server stops due to some critical errors, the remaining web server can take over and provide services continuously.

Error Recovery Technique for Improving Reliability of Embedded Systems

  • Son, Sunghoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.6
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper, we propose a fault tolerance technique which enables embedded systems to run without interruption while its operating system and tasks fail. In order to improve reliability, the proposed scheme makes an embedded system run as a virtual machine on virtual machine monitor. It also prepares a contingency virtual machine at which periodical backups of the embedded system are saved. When an error occurs in the main virtual machine, the corresponding standby virtual machine takes a role of the main virtual machine and continues its operation. Especially such backups and switches of virtual machines are performed with minor performance degradation by manipulating page table entries in virtual machine monitor. By conducting performance evaluation studies, we show that the proposed scheme makes embedded system robust against errors while it does not degrade the performance of the system significantly.

Design and Implementation of HA NAS with Fault-Tolerance (Fault-Tolerance 기능을 갖는 HA NAS 시스템의 설계 및 구현)

  • 김주영;박준희;권혁빈;서희정;정영준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.664-666
    • /
    • 2004
  • 최근 업무에 컴퓨터를 이용하는 사례가 늘어나고, 각종 컨텐츠 서비스업계가 발전하면서 독립적인 파일 서버 기능만을 처리하도록 만든 네트워크 저장장치인 NAS(Network Attached Storage)의 이용이 점차 증가하는 추세에 있다. NAS는 기존 파일서버의 문제점을 보완하면서 이기종 플랫폼간의 파일 공유, 스토리지 확장성, 관리 용이성 등을 특징으로 한다. 그러나 NAS 시스템에 장애가 발생할 경우에는 막대한 경제적인 손실이 발생하게 된다. 따라서 본 논문에서는 NAS 시스템 장애가 발생하였을 때, 효율적으로 장애를 복구할 수 있는 HA(High Availability) NAS 시스템을 설계 및 구현하고, 다양한 장애 상황에서 NAS를 이용하는 서비스가 중단 없이 제공되는 것을 확인하였다

  • PDF