Search | Korea Science

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.26 no.4
- /
- pp.127-148
- /
- 2020
The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.
https://doi.org/10.13088/jiis.2020.26.4.127 인용 PDF KSCI

Interpretation of Microscale Behaviors and Precision Measurement Monitoring for the Five-story and Seven-story Stone Pagodas from Cheongnyangsaji Temple Site in Gongju, Korea (공주 청량사지 오층석탑 및 칠층석탑의 정밀 계측모니터링과 미세거동 해석)

LEE Jeongeun;PARK Seok Tae;LEE Chan Hee
- Korean Journal of Heritage: History & Science
- /
- v.56 no.4
- /
- pp.132-158
- /
- 2023
The five-story and seven-story stone pagodas at Cheongnyangsaji temple site in Gongju are located under the Sambulbong peak of Gyeryongsan mountain, and are known to have been built of the middle in Goryeo dynasty. As the two pagodas in which two types of Baekje stone pagoda coexist in one era, their historical and academic value are recognized. The seven-story pagoda was overturned by robbery in 1944, and as a result, the five-story pagoda was tilted. Although the two pagodas were restored in 1961, structural instability was continuously raised. In this study, measurement data accumulated from May 2021 to March 2022, and seasonal characteristics were reviewed, and the micro behavior of pagodas were analyzed according to temperature and precipitation during the same period. As a result, the micro thermoelastic behavior was repeated according to the daily temperature change in all sensors, and both the slope and the displacement showed microscale behavior. In the inclinometer, moisture containing the surface and inside of the stones repeated expansion and contraction due to temperature change, showing the micro movements. In particular, the upper part of the five-story pagoda moved up to 3.89° to the northwest, and the seven-story pagoda tilted up to 0.078° to the northeast. The maximum displacements were recorded as 0.127 and 0.149 mm in the five-story and the seven-story pagoda, respectively. These values tended to return to the original position at the end of the measurement, but did not recover completely, indicating a state requiring precise monitoring. The result obtained through the study can be used as basic data for the stable conservation of the two stone pagodas. Based on the behavioral characteristics considering various environmental factors should be analyzed, and the preventive conservation through the maintenance of measurement system built this time should be continued.
https://doi.org/10.22755/kjchs.2023.56.4.132 인용 PDF

Search Result 2,012, Processing Time 0.023 seconds

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

Interpretation of Microscale Behaviors and Precision Measurement Monitoring for the Five-story and Seven-story Stone Pagodas from Cheongnyangsaji Temple Site in Gongju, Korea (공주 청량사지 오층석탑 및 칠층석탑의 정밀 계측모니터링과 미세거동 해석)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)