• Title/Summary/Keyword: Fault-Tolerance Service

Search Result 58, Processing Time 0.023 seconds

A Design and Implementation of Fault Tolerance Agent on Distributed Multimedia Environment (분산 멀티미디어 환경에서 결함 허용 에이전트의 설계 및 구현)

  • Go, Eung-Nam;Hwang, Dae-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.10
    • /
    • pp.2618-2629
    • /
    • 1999
  • In this paper, we describe the design and implementation of the FDRA(Fault Detection Recovery based on Agent) running on distributed multimedia environment. DOORAE is a good example for distributed multimedia and multimedia distance education system among students and teachers during lecture. It has primitive service agents. Service functions are implemented with objected oriented concept. FDRA is a multi-agent system. It has been environment, intelligent agents interact with each other, either collaboratively or non-collaboratively, to achieve their goals. The main idea is to detect an error by using polling method. This system detects an error by polling periodically the process with relation to session. And, it is to classify the type of error s automatically by using learning rules. The merit of this system is to use the same method to recovery it as it creates a session. FDRA is a system that is able to detect an error, to classify an error type, and to recover automatically a software error based on distributed multimedia environment.

  • PDF

Integrating Resilient Tier N+1 Networks with Distributed Non-Recursive Cloud Model for Cyber-Physical Applications

  • Okafor, Kennedy Chinedu;Longe, Omowunmi Mary
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2257-2285
    • /
    • 2022
  • Cyber-physical systems (CPS) have been growing exponentially due to improved cloud-datacenter infrastructure-as-a-service (CDIaaS). Incremental expandability (scalability), Quality of Service (QoS) performance, and reliability are currently the automation focus on healthy Tier 4 CDIaaS. However, stable QoS is yet to be fully addressed in Cyber-physical data centers (CP-DCS). Also, balanced agility and flexibility for the application workloads need urgent attention. There is a need for a resilient and fault-tolerance scheme in terms of CPS routing service including Pod cluster reliability analytics that meets QoS requirements. Motivated by these concerns, our contributions are fourfold. First, a Distributed Non-Recursive Cloud Model (DNRCM) is proposed to support cyber-physical workloads for remote lab activities. Second, an efficient QoS stability model with Routh-Hurwitz criteria is established. Third, an evaluation of the CDIaaS DCN topology is validated for handling large-scale, traffic workloads. Network Function Virtualization (NFV) with Floodlight SDN controllers was adopted for the implementation of DNRCM with embedded rule-base in Open vSwitch engines. Fourth, QoS evaluation is carried out experimentally. Considering the non-recursive queuing delays with SDN isolation (logical), a lower queuing delay (19.65%) is observed. Without logical isolation, the average queuing delay is 80.34%. Without logical resource isolation, the fault tolerance yields 33.55%, while with logical isolation, it yields 66.44%. In terms of throughput, DNRCM, recursive BCube, and DCell offered 38.30%, 36.37%, and 25.53% respectively. Similarly, the DNRCM had an improved incremental scalability profile of 40.00%, while BCube and Recursive DCell had 33.33%, and 26.67% respectively. In terms of service availability, the DNRCM offered 52.10% compared with recursive BCube and DCell which yielded 34.72% and 13.18% respectively. The average delays obtained for DNRCM, recursive BCube, and DCell are 32.81%, 33.44%, and 33.75% respectively. Finally, workload utilization for DNRCM, recursive BCube, and DCell yielded 50.28%, 27.93%, and 21.79% respectively.

A Multi-objective Optimization Approach to Workflow Scheduling in Clouds Considering Fault Recovery

  • Xu, Heyang;Yang, Bo;Qi, Weiwei;Ahene, Emmanuel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.976-995
    • /
    • 2016
  • Workflow scheduling is one of the challenging problems in cloud computing, especially when service reliability is considered. To improve cloud service reliability, fault tolerance techniques such as fault recovery can be employed. Practically, fault recovery has impact on the performance of workflow scheduling. Such impact deserves detailed research. Only few research works on workflow scheduling consider fault recovery and its impact. In this paper, we investigate the problem of workflow scheduling in clouds, considering the probability that cloud resources may fail during execution. We formulate this problem as a multi-objective optimization model. The first optimization objective is to minimize the overall completion time and the second one is to minimize the overall execution cost. Based on the proposed optimization model, we develop a heuristic-based algorithm called Min-min based time and cost tradeoff (MTCT). We perform extensive simulations with four different real world scientific workflows to verify the validity of the proposed model and evaluate the performance of our algorithm. The results show that, as expected, fault recovery has significant impact on the two performance criteria, and the proposed MTCT algorithm is useful for real life workflow scheduling when both of the two optimization objectives are considered.

PBFT Blockchain-Based OpenStack Identity Service

  • Youngjong, Kim;Sungil, Jang;Myung Ho, Kim;Jinho, Park
    • Journal of Information Processing Systems
    • /
    • v.18 no.6
    • /
    • pp.741-754
    • /
    • 2022
  • Openstack is widely used as a representative open-source infrastructure of the service (IaaS) platform. The Openstack Identity Service is a centralized approach component based on the token including the Memcached for cache, which is the in-memory key-value store. Token validation requests are concentrated on the centralized server as the number of differently encrypted tokens increases. This paper proposes the practical Byzantine fault tolerance (PBFT) blockchain-based Openstack Identity Service, which can improve the performance efficiency and reduce security vulnerabilities through a PBFT blockchain framework-based decentralized approach. The experiment conducted by using the Apache JMeter demonstrated that latency was improved by more than 33.99% and 72.57% in the PBFT blockchain-based Openstack Identity Service, compared to the Openstack Identity Service, for 500 and 1,000 differently encrypted tokens, respectively.

The Inplementation of Fault-Tolerant Dual System Using the Hot-Standby Sparing Technique (핫 스탠바이 스페어링 기법을 이용한 고장 감내 이중화 시스템 설계)

  • Shin Jin wook;Park Dong sun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10A
    • /
    • pp.1113-1122
    • /
    • 2004
  • This paper is basically to achieve the high-availability and high-reliability of the control system from the implementation of the fault-tolerant system using the hot-standby sparing technique. To meet the objective, we design and implement a board with fault tolerance I/O bus to detect the fault. Warm-standby sparing technique is the fault tolerance technique usually used for switching control system in present. This technique can be easily implemented, but can not detect the fault quickly and can malfunction because of the hardware fault. The hot-standby sparing fault tolerant technique implemented in this paper is consists of dual processor modules and a I/O processor using fault tolerant I/O bus. The proposed method can find the faults as soon as possible, so it can prevent from wrong operation. Also it is possible to normal re-service due to the short recovering time. To implement the fault-tolerant dual system with fault detection be, two daughter, called FTMA and FTIA, boards designed and implemented are applied to the system. And we also simulated the proposed method to verify the high-availability and high-reliability of the control system using Markov process.

A Design of Control Network for DCS in Nuclear Power Plant (원전 DCS용 제어통신망 설계)

  • 이재민;박태림;문홍주;권욱현
    • Proceedings of the IEEK Conference
    • /
    • 2000.06a
    • /
    • pp.85-88
    • /
    • 2000
  • Distributed Control System(DCS) is one of the best solutions to implement control systems because it provides continuous observation of control process and execution of commands to induce proper operations. In this paper, a design of control network for DCS in nuclear power plant is proposed. The proposed control network on DCS has a simple architecture and deterministic property. Thus, the proposed control network offers hard real-time periodic service. It also has redundant media for the fault-tolerance. As a result, high safety and reliability required in nuclear power plant are guaranteed.

  • PDF

DOVE : A Distributed Object System for Virtual Computing Environment (DOVE : 가상 계산 환경을 위한 분산 객체 시스템)

  • Kim, Hyeong-Do;Woo, Young-Je;Ryu, So-Hyun;Jeong, Chang-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.2
    • /
    • pp.120-134
    • /
    • 2000
  • In this paper we present a Distributed Object oriented Virtual computing Environment, called DOVE which consists of autonomous distributed objects interacting with one another via method invocations based on a distributed object model. DOVE appears to a user logically as a single virtual computer for a set of heterogeneous hosts connected by a network as if objects in remote site reside in one virtual computer. By supporting efficient parallelism, heterogeneity, group communication, single global name service and fault-tolerance, it provides a transparent and easy-to-use programming environment for parallel applications. Efficient parallelism is supported by diverse remote method invocation, multiple method invocation for object group, multi-threaded architecture and synchronization schemes. Heterogeneity is achieved by automatic data arshalling and unmarshalling, and an easy-to-use and transparent programming environment is provided by stub and skeleton objects generated by DOVE IDL compiler, object life control and naming service of object manager. Autonomy of distributed objects, multi-layered architecture and decentralized approaches in hierarchical naming service and object management make DOVE more extensible and scalable. Also,fault tolerance is provided by fault detection in object using a timeout mechanism, and fault notification using asynchronous exception handling methods

  • PDF

Simulation and Evaluation of Redistribution Algorithms In Fault-Tolerant Distributed System (결함허용 분산시스템의 재분배 알고리즘의 시뮬레이션과 평가)

  • 최병갑;이천희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.8
    • /
    • pp.1-10
    • /
    • 1994
  • In this paper load redistribution algorithm to allow fault-tolerance by redistributing the workload of n failure nodes to the remaining good nodes in distributed systems are investigated. To evaluate the efficiency of the algorithms a simulation model of algorithms is developed using SLAM II simulation language. The job arrival rate service rate failure and repair rate of nodes and communication delay time due to load migraion are used as parameters. The result of the simulation shows that the job arrival rate failure and repair rate of nodes do not affected on the relative efficiency of algorithms. If the communication delay time is greater than average job processing time algorithm B is better. Otherwise algorithm C is superior to the others.

  • PDF

Bi-active Load Balancer for enhancing of scalability and fault-tolerance of Cluster System (확장성과 고장 감내를 위한 효율적인 부하 분산기)

  • Kim, Young-Hwan;Youn, Hee-Yong;Choo, Hyun-Seung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.381-384
    • /
    • 2002
  • This paper describes the motivation, design and performance of bi-active Load balancer in Linux Virtual Server. The goal of bi-active Load balancer is to provide a framework to build highly scalable, fault-tolerant services using a large cluster of commodity servers. The TCP/IP stack of Linux Kernel is extended to support three IP load balancing techniques, which can make parallel services of different kinds of server clusters to appear as a service on a single IP address. Scalability is achieved by transparently adding or removing a node in the cluster. and high availability is provided by detecting node or daemon failures and reconfiguring the system appropriately. Extensive simulation reveals that the proposed approach improves the reply rate about 20% compared to earlier design.

  • PDF

JOB Scheduling for process Control in Hierarchical Computer Network (계층구조 Computer Network에서 공정제어를 위한 JOB Scheduling)

  • Park, Yil
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.5 no.1
    • /
    • pp.83-87
    • /
    • 1980
  • The distributive processing job in a hierarchical computer network, which supervises and controls the complex relations between the variables periodically for raising the fault folerance, can be defined its periodicity and its execution time. All the job may be composed of the subsets in relation of Tree structure. For a processor job set this paper finds out a job scheduling algorithm that has the less loose time between period than that of FCFS.

  • PDF