• Title/Summary/Keyword: Fault-Tolerant Computer

Search Result 195, Processing Time 0.025 seconds

Fault-tolerant sorting network with sub-switches (서브 스위치를 이용한 오류 허용 정렬 네트워크)

  • Kim, Heung-Jin;Son, Yoo-Ek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04b
    • /
    • pp.1293-1296
    • /
    • 2002
  • 본 연구에서는 서브 스위치를 이용한 오류허용 정렬 네트워크를 제안한다. 기존의 정렬 네트워크에서는 여분의 경로와 네트워크의 복잡성문제가 있었다. 제안된 구조에서는 여분 경로를 확장시키기 위해서 각 스테이지마다 $\frac{N}{4}\sum\limits_{i=0}^{n=2}{(\frac{1}{2})}^i$ (N=입 출력수) 서브 스위치를 추가함으로써 여분의 경로가 $3^{n(n+1)/2}$만큼 증가하였다. 또한 제안된 정렬 네트워크는 기존의 네트워크의 이중 네트워크 plane 개념에서 사용한 스위치 소자와 링크 수와 비교해 볼 때 제안된 구조의 단일 plane을 이용한 구조가 복잡도에서 낮다. 결론적으로 제안된 서브 스위치를 이용한 오류 허용 정렬 네트워크는 여분의 경로를 증가시키면서 하드웨어적 복잡도를 감소시킬 수 있다.

  • PDF

Fault-Tolerant Adaptive Routing : Improved RIFP by using SCP in Mesh Multicomputers (적응적 오류 허용 라우팅 : SCP를 이용한 메쉬 구조에서의 RIFP 기법 개선)

  • 정성우;김성천
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.11
    • /
    • pp.603-609
    • /
    • 2003
  • Adaptive routing methods are studied for effective routing in many topologies where occurrence of the faulty nodes are inevitable. Mesh topology provides simplicity in implementing these methods. Many routing methods for mesh are able to tolerate a large number of faults enclosed by a rectangular faulty block. But they consider even good nodes in the faulty block as faulty nodes. Hence, it results the degradation of node utilization. This problem is solved by a method which transmits messages to destinations within faulty blocks via multiple “intermediate nodes”. It also divides faulty block into multiple expanded meshes. With these expanded meshes, DAG(Directed Acyclic Graph) is formed and a message is able to be routed by the shortest path according to the DAG. Therefore, the additional number of hops can be resulted. We propose a method that reduces the number of hops by searching direct paths from the destination node to the border of the faulty block. This path is called SCP(Short-Cut Path). If the path and the traversing message is on the same side of outside border of the faulty block, the message will cut into the path found by our method. It also reduces the message traverse latency between the source and the destination node.

Availability Analysis of Multiplex Systems using Software Rejuvenation Method (소프트웨어 재활 기법을 적용한 다중계 시스템의 가용도 분석)

  • Park, Kie-Jin;Kim, Sung-Soo;Kim, Jai-Hoon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.8
    • /
    • pp.730-740
    • /
    • 2000
  • The software rejuvenation method for highly available multiplex systems uses a pro-active fault-tolerant approach to handle system failures. The software rejuvenation prevents failures from occurring, while the previous methods recover from failures after happening. Especially, since the software aging proceeds fast in the software used for the multimedia mobile computing due to the loss of communications or data, the preventive method from failures using software rejuvenation can be used for the multimedia mobile computing. In this paper, according to the operational parameters such as rejuvenation period, rejuvenation time, failure rate and repair rate of the servers, number of running servers, duration of running time, and type of running modes, we calculate steady-state probabilities, downtime, availability, and cost of the multiplex systems using software rejuvenation method. We validate the closed-form solutions of the mathematical model by experiments based on various operational parameters and find that the software rejuvenation method can be adopted as preventive fault-tolerant technique. The failure rate and unstable rate of the servers are essential factors for the decision making of the rejuvenation policies.

  • PDF

How To Support Scalability in Causal Message Logging (인과적 메시지 로깅에서 확장성 지원 방법)

  • Kim, Ki-Bom;Hwang, Chung-Sun;Yu, Heon-Chang;Shon, Jin-Gon;Jung, Soon-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.362-372
    • /
    • 2000
  • The causal message logging is a low-cost technique of building a distributed system that can tolerate process crash failures. Previous research in causal message logging protocol assumes that the number of processes in a fault-tolerant system is fixed. This assumption makes all processes modify their data structures when a new process is added or an existing process terminates. However, the proposed approach in this paper allows to each process retain identifiers of only the communicating processes instead of all processes. This mechanism enables the fault-tolerant system to operate at many different scales. Using this mechanism, we develop a new algorithm that can be adapted for recovery in existing causal message logging protocols. Our recovery algorithm is 1) a distributed technique which does not require recovery leader, 2) a nonblocking protocol which does not force live processes to block while recovery is in progress, and 3) a novel mechanism which can tolerate failures of an arbitrary number of processes. Earlier causal message logging protocols lack one or more of the above properties.

  • PDF

A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers (다중 분할된 구조를 가지는 클러스터 검사점 저장 기법)

  • Chang, Yun-Seok
    • The KIPS Transactions:PartA
    • /
    • v.13A no.7 s.104
    • /
    • pp.607-614
    • /
    • 2006
  • The checkpointing schemes should reduce the process delay through managing the checkpoints of each node to fit the network load to enhance the performance of the process running on the cluster system that write the checkpoints into its global stable storage. For this reason, a cluster system with single IO space on a distributed RAID chooses a suitable checkpointng scheme to get the maximum IO performance and the best rollback recovery efficiency. In this paper, we improved the striped checkpointing scheme with dynamic stripe group size by adapting to the network bandwidth variation at the point of checkpointing. To analyze the performance of the multi striped checkpointing scheme, we applied Linpack HPC benchmark with MPI on our own cluster system with maximum 512 virtual nodes. The benchmark results showed that the multistriped checkpointing scheme has better performance than the striped checkpointing scheme on the checkpoint writing efficiency and rollback recovery at heavy system load.

A Recovery Mechanism for Server Failure in Database Systems based on Mobile computing Environments (이동 컴퓨팅 환경에 기반을 둔 데이터베이스 시스템에서 서버의 고장 회복 기법)

  • Jo, Jeong-Ran;Hwang, Bu-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.1
    • /
    • pp.1-10
    • /
    • 1999
  • A mobile computing environment is one that support user's mobility through the wireless communication technology. Users access the database and get results what they want by running mobile transactions. To run the mobile transaction correctly and to maintain the consistency I database, we need a concurrency control method to schedule transactions, a caching method to manage the cache, and a recovery method to construct a fault tolerant system. A mobile computing system is based on the existing distributed system, but we can't use recovery methods of the existing distributed system directly because of the user's mobility and the features of wireless media. So this paper presents a recovery mechanism to construct a fault tolerant mobile computing systems. Especially. we develop and analyze a recovery algorithm for server failure among types of failure which can arise in mobile computing environments.

  • PDF

Fault-tolerant Algorithm for Resource Selection Based on Mobile Devices‘ Characteristics in Mobile Grid (모바일 그리드에서 모바일 장치의 특성을 고려한 결함 포용적 자원 선택 알고리즘)

  • Choi, Sook-Kyong;Lee, Jong-Hyuk;Chung, Kwang-Sik;Yu, Heon-Chang
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06d
    • /
    • pp.261-266
    • /
    • 2007
  • 그리드 컴퓨팅은 이질적인 환경에서 자원 공유를 가능하게 함으로써 작업을 효율적으로 수행할 수 있는 환경을 제공한다. 현재 그리드 컴퓨팅의 환경은 유선 그리드 환경에 모바일 장치들을 통합하는 무선 그리드와, 사용자의 이동성을 고려한 모바일 그리드 환경으로 진화하고 있다. 모바일 장치는 성능 면에서 비약적으로 증가하고 있고, 최근 몇 년 사이에 사용자도 많이 늘어났다. 따라서 본 논문에서는 모바일 그리드 환경에서 모바일 장치를 자원으로 이용하기 위하여 모바일 장치의 특성을 고려한 결함 포용적 자원 선택 알고리즘을 제안한다. 이 알고리즘은 1) 모바일 장치의 배터리 잔류량 정보, 이동성 정보, 장치의 성능 정보를 고려하여, 2) 모바일 장치들의 순위를 계산하고 k개의 그룹으로 분류한 뒤, 3) 작업을 할당할 때 결함 포용을 고려하여 최상위 그룹과 차상위 그룹에 동시에 작업을 분배한다. 모바일 장치의 순위를 매기고 그룹화하는 과정은 모바일 장치의 동적인 특성을 고려하여 작업이 요청될 때마다 수행하도록 한다.

  • PDF

Fault-Tolerant Scheduling Mechanism based on Self-organizing Computation Overlay Network in Decentralized P2P Grid System (분산형 P2P 그리드 시스템에서 자가 조직적 계산 오버레이 네트워크 기반 결함 포용적 스케줄링 기법)

  • Kim SeoK-In;Park Chan-Yeol;Choi Jang-Won;Kim Hong-Soo;Gil Joon-Min;Hwang Chong-Sun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.415-417
    • /
    • 2006
  • 분산형 P2P 그리드 시스템을 구축하는데 있어 연산 수행을 위한 노드 구성 기법과 구성된 토플로지에 적합한 연산 수행 보델 및 스케줄링 기법은 필수 요소이다. 하지만 기존 연구에서는 자원 제공자와 휘발성을 고려하지 않은 연산 수행 모델을 사용하였기 때문에 연산의 안정적인 수행이 보장되지 못하고, 시스템의 성능이 떨어지는 문제점이 발생한다. 이에 본 논문에서는 가용성 기반의 자가 조직적 계산 오버레이 네트워크(SelfCON:Self-organizing Computation Overlay Network) 구성 기법과 구성된 토폴로지에 적합한 연산 수행 모델 및 스케줄링 기법을 제안한다. 제안 기법은 자원 제공자 노드의 휘발성을 고려하여 안정성을 높임으로써 전체 연산 성능을 향상시킨다.

  • PDF

Efficient Fault-Tolerant Multicast on Hypercube Multicomputer System (하이퍼 큐브 컴퓨터에서 효과적인 오류 허용 다중전송기법)

  • 명훈주;김성천
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.5_6
    • /
    • pp.273-279
    • /
    • 2003
  • Hypercube multicomputers have been drawing considerable attention from many researchers due to their regular structure and short diameter. One of keys to the performance of Hypercube is the efficiency of communication among processors. Among several communication patterns, multicast is important, which is found in a variety of applications as data replication and signal processing. As the number of processors increases, the probability of occurrences of fault components also increases. So it would be desirable to design an efficient scheme that multicasts messages in the presence of faulty component. In fault-tolerant routing and multicast, there are local information based scheme, global information based scheme and limited information based scheme in terms of information. In general, limited information is easy to obtain and maintain by compressing information in a concise format. In this paper, we propose a new routing scheme and a new multicast scheme using recently proposed fully reachability information scheme and new local information scheme. The proposed multicast scheme increases multicast success possibility and reduce deroute cases. Experiments show that multicast success possibility can increase at least 15% compared to previous method.

Constructing Algorithm for Optimal Edge-Disjoint Spanning Trees in Odd Interconnection Network $O_d$ (오드 연결망 $O_d$에서 에지 중복 없는 최적 스패닝 트리를 구성하는 알고리즘)

  • Kim, Jong-Seok;Lee, Hyeong-Ok;Kim, Sung-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.429-436
    • /
    • 2009
  • Odd network was introduced as one model of graph theory. In [1], it was introduced as a class of fault-tolerant multiprocessor networks and analyzed so many useful properties such as simple routing algorithms, maximal fault tolerance, node axsjoint path, etc. In this paper, we sauw a construction algorithm of edge-axsjoint spanning trees in Odd network $O_d$. Also, we prove that edge-disjoint spanning tree generated by our algorithm is optimal edge-disjoint spanning tree.