Search | Korea Science

Scalable Approach to Failure Analysis of High-Performance Computing Systems

Shawky, Doaa
- ETRI Journal
- /
- v.36 no.6
- /
- pp.1023-1031
- /
- 2014
Failure analysis is necessary to clarify the root cause of a failure, predict the next time a failure may occur, and improve the performance and reliability of a system. However, it is not an easy task to analyze and interpret failure data, especially for complex systems. Usually, these data are represented using many attributes, and sometimes they are inconsistent and ambiguous. In this paper, we present a scalable approach for the analysis and interpretation of failure data of high-performance computing systems. The approach employs rough sets theory (RST) for this task. The application of RST to a large publicly available set of failure data highlights the main attributes responsible for the root cause of a failure. In addition, it is used to analyze other failure characteristics, such as time between failures, repair times, workload running on a failed node, and failure category. Experimental results show the scalability of the presented approach and its ability to reveal dependencies among different failure characteristics.
https://doi.org/10.4218/etrij.14.0113.1133 인용 PDF KSCI KPUBS

Improving TCP Performance through Pre-detection of Route Failure in Mobile Ad Hoc Networks (Ad Hoc 망에서 경로단절 사전감지를 통한 TCP 성능향상)

Lee Byoung-Yeul;Lim Jae-Sung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.11B
- /
- pp.900-910
- /
- 2004
Route failure is mainly caused by mobility of mobile host in ad hoc networks. Route failure, which may lead to sudden packet losses and delays, is losing the route from source to destination. In this situation, TCP assumes that congestion has occurred within the network and also initiates the congestion control procedures. Congestion control algorithm provides the means for the source to deal with lost packets. TCP performance in ad hoc environments will be degraded as TCP source cannot distinguish congestion from route failure. In this paper, we propose TCP-P as pre-detection approach to deal with route failure. TCP-P freezes TCP through pre-detection of route failure. Route failure information of the proposed mechanism is obtained not by routing protocol but by MAC protocol. The intermediated node, obtaining route failure information by its MAC layer, relays the information to TCP source and lets TCP source stop the congestion control algorithm. Results reveal that TCP-P responding with proactive manner outperforms other approaches in terms of communication throughput under the presence of node mobility.
PDF KSCI

A management scheme of agent node in crowd group (군집 그룹에서 에이전트 노드 관리 방안)

Park, Sangjoon;Lee, Jongchan
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2021.10a
- /
- pp.537-538
- /
- 2021
In this paper, we consider the agent management scheme for the data gathering in the crowd group. To the critical data gathered in dangerous region, the possibility of mission failure caused by the sensor node damage can be high. Hence, we study the node processing to the danger agent node through the cooperative network method for the sensor nodes.
PDF

New slave-node constraints and element for adaptive analysis of C⁰ plates

Sze, K.Y.;Wu, D.
- Structural Engineering and Mechanics
- /
- v.39 no.3
- /
- pp.339-360
- /
- 2011
In the h-type adaptive analysis, when an element is refined or subdivided, new nodes are added. Among them are the transition nodes which are the corner nodes of the new elements formed by subdivision and, simultaneously, the mid-side nodes of the adjacent non-subdivided elements. To secure displacement compatibility, the slave-node approach in which the DOFs of a transition node are constrained by those of the adjacent nodes had been used. Alternatively, transition elements which possess the transition nodes as active mid-side/-face nodes can be used. For C0 plate analyses, the conventional slave-node constraints and the previously derived ANS transition elements are implemented. In both implementations, the four-node element is the ANS element. With reference to the predictions of the transition elements, the slave-node approach not only delivers erroneous results but also fails the patch test. In this paper, the patch test failure is resolved by developing a set of new constraints with which the slave-node approach surpasses the transition-element approach. The accuracy of the slave-node approach is further improved by developing a hybrid four-node element in which the assumed moment and shear force modes are in strict equilibrium.
https://doi.org/10.12989/sem.2011.39.3.339 인용 KSCI

Providing survivability for virtual networks against substrate network failure

Wang, Ying;Chen, Qingyun;Li, Wenjing;Qiu, Xuesong
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.9
- /
- pp.4023-4043
- /
- 2016
Network virtualization has been regarded as a core attribute of the Future Internet. In a network virtualization environment (NVE), multiple heterogeneous virtual networks can coexist on a shared substrate network. Thus, a substrate network failure may affect multiple virtual networks. In this case, it is increasingly critical to provide survivability for the virtual networks against the substrate network failures. Previous research focused on mechanisms that ensure the resilience of the virtual network. However, the resource efficiency is still important to make the mapping scheme practical. In this paper, we study the survivable virtual network embedding mechanisms against substrate link and node failure from the perspective of improving the resource efficiency. For substrate link survivability, we propose a load-balancing and re-configuration strategy to improve the acceptance ratio and bandwidth utilization ratio. For substrate node survivability, we develop a minimum cost heuristic based on a divided network model and a backup resource cost model, which can both satisfy the location constraints of virtual node and increase the sharing degree of the backup resources. Simulations are conducted to evaluate the performance of the solutions. The proposed load balancing and re-configuration strategy for substrate link survivability outperforms other approaches in terms of acceptance ratio and bandwidth utilization ratio. And the proposed minimum cost heuristic for substrate node survivability gets a good performance in term of acceptance ratio.
https://doi.org/10.3837/tiis.2016.09.001 인용 PDF KSCI KPUBS HTML

Packet Lossless Fast Rerouting Scheme without Buffer Delay Problem in MPLS Networks (MPLS망에서 버퍼지연 문제가 발생하지 않는 무손실 Fast Rerouting 기법)

신상헌;신해준;김영탁
- Journal of KIISE:Information Networking
- /
- v.31 no.2
- /
- pp.233-241
- /
- 2004
In this paper, we propose a packet-lossless fast rerouting scheme at a link/node fault in MPLS (Multiprotocol Label Switching) network with minimized accumulated buffer delay problem at ingress node. The proposed scheme uses a predefined, alternative LSP (Label Switched Path) In order to restore user traffic. We propose two restoration approaches. In the first approach, an alternative LSP is initially allocated with more bandwidth than the protected working LSP during the failure recovery phase. After the failure recovery, the excessively allocated bandwidth of the alternative LSP is readjusted to the bandwidth of the working LSP. In the second approach, we reduce the length of protected working LSP by using segment-based restoration. The proposed approaches have merits of (ⅰ) no buffer delay problem after failure recovery at ingress node, and (ⅱ) the smaller required buffer size at the ingress node than the previous approach.
PDF KSCI

A Path Restoration Method Independent of Failure Location in All-Optical Networks (전광 통신망에서 장애 위치에 독립적인 경로 복구 방법)

이명문;유진태;김용범;박진우
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.11C
- /
- pp.85-93
- /
- 2001
In this paper, a path restoration method independent of failure location in all-optical networks is proposed and its wavelength requirements are calculated. In the proposed method, since a single backup wavelength is used for any link failure, a node can consist of only fixed wavelength transmitters, resulting in the levels node cost. Hence, restoration process can be triggered just after the failure detection, if combined with edge-disjoint path restoration method. This feature and the parallel cross-connection message transfer technique proposed in this paper make the restoration process faster. Also, it is shown the wavelength requirements in the proposed method are similar to the ones in the method using tunable backup wavelength, resulting in little increment for transmission cost.
PDF

Patterns of initial failure after resection for gallbladder cancer: implications for adjuvant radiotherapy

Kim, Tae Gyu
- Radiation Oncology Journal
- /
- v.35 no.4
- /
- pp.359-367
- /
- 2017
Purpose: This study sought to identify potential candidates for adjuvant radiotherapy and patterns of regional failure in patients who underwent curative-intent surgery for gallbladder cancer. Materials and Methods: Records for 70 patients with gallbladder cancer who underwent curative resection at a single institution between 2000 and 2016 were analysed retrospectively. No patients received adjuvant radiotherapy. Initial patterns of failure were evaluated. Regional recurrence was categorized according to the definitions of lymph node stations suggested by the Japanese Society of Hepato-Biliary-Pancreatic Surgery. Results: Median follow-up was 23 months. Locoregional recurrence as any component of first failure occurred in 29 patients (41.4%), with isolated locoregional recurrence in 13 (18.6%). Regional recurrence occurred in 23 patients, and 77 regional recurrences were identified. Commonly involved regional stations were #13, #12a2, #12p2, #12b2, #16a2, #16b1, #9, and #8. Independent prognostic factors for locoregional recurrence were ${\geq}pT2$ disease (hazard ratio [HR], 5.510; 95% confidence interval [CI], 1.260-24.094; p = 0.023) and R1 resection (HR, 6.981; 95% CI, 2.378-20.491; p < 0.001). Conclusion: Patients with pT2 disease or R1 resection after curative surgery for gallbladder cancer may benefit from adjuvant radiotherapy. Our findings on regional recurrence may help physicians construct a target volume for adjuvant radiotherapy.
https://doi.org/10.3857/roj.2017.00388 인용 PDF KSCI

Failure Analysis of Reinforced Concrete Slabs using Pseudo-Volume Control Method (의사체적제어법을 이용한 철근콘크리트 슬래브의 파괴거동 해석)

심상효;송하원;최강룡;남상혁;변근주
- Proceedings of the Korea Concrete Institute Conference
- /
- 2000.04a
- /
- pp.577-582
- /
- 2000
The pseudo-volume control method is developed for the failure analysis of RC slabs, by adding pressure node into layered shell element utilizing in-plane constitutive models of reinforced concrete and layered formulation. For the failure analysis of RC slabs n this paper, geometric nonliearity is also considered in the analysis. The validity of the pseudo-volume control method is verified by comparing analysis results and existing experimental results.
PDF

Dual Sink Nodes for Sink Node Failure in Wireless Sensor Networks (무선 센서 네트워크에서의 싱크노드 실패에 대비한 이중 싱크노드 장치)

Kim, Dae-Il;Park, Lae-Jeong;Park, Sung-Wook;Lee, Hyung-Bong;Moon, Jung-Ho;Chung, Tae-Yun
- IEMEK Journal of Embedded Systems and Applications
- /
- v.6 no.6
- /
- pp.369-376
- /
- 2011
Since wireless sensor networks generally have the capability of network recovery, malfunction of a few sensor nodes in a sensor network does not cause a crucial problem paralyzing the sensor network. The malfunction of the sink node, however, is critical. If the sink node of a sensor network stops working, the data collected by sensor nodes cannot be delivered to the gateway because no other sensor nodes can take the place of the sink node. This paper proposes a TDMA-based wireless sensor network equipped with dual sink nodes, with a view to preventing data loss in the case of malfunction of a sink node. A secondary sink node, which synchronizes with a primary sink node and receives data from other sensor nodes in normal situations, takes the role of the primary sink node in the case of malfunction of the primary sink, thereby eliminating the possibility of data loss. The effectiveness of the proposed scheme is demonstrated through experiments.
https://doi.org/10.14372/IEMEK.2011.6.6.4 인용 PDF KSCI

Search Result 400, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)