• Title/Summary/Keyword: time series & cluster analysis

Search Result 77, Processing Time 0.027 seconds

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Comparative analysis of linear model and deep learning algorithm for water usage prediction (물 사용량 예측을 위한 선형 모형과 딥러닝 알고리즘의 비교 분석)

  • Kim, Jongsung;Kim, DongHyun;Wang, Wonjoon;Lee, Haneul;Lee, Myungjin;Kim, Hung Soo
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1083-1093
    • /
    • 2021
  • It is an essential to predict water usage for establishing an optimal supply operation plan and reducing power consumption. However, the water usage by consumer has a non-linear characteristics due to various factors such as user type, usage pattern, and weather condition. Therefore, in order to predict the water consumption, we proposed the methodology linking various techniques that can consider non-linear characteristics of water use and we called it as KWD framework. Say, K-means (K) cluster analysis was performed to classify similar patterns according to usage of each individual consumer; then Wavelet (W) transform was applied to derive main periodic pattern of the usage by removing noise components; also, Deep (D) learning algorithm was used for trying to do learning of non-linear characteristics of water usage. The performance of a proposed framework or model was analyzed by comparing with the ARMA model, which is a linear time series model. As a result, the proposed model showed the correlation of 92% and ARMA model showed about 39%. Therefore, we had known that the performance of the proposed model was better than a linear time series model and KWD framework could be used for other nonlinear time series which has similar pattern with water usage. Therefore, if the KWD framework is used, it will be possible to accurately predict water usage and establish an optimal supply plan every the various event.

Methoden Zur Beschreibung dar Unfallgeschehens des - Versuch eines Vergleichs Zwischen der Bundesrepublik Deutschland und der Republik Korea - (한국과 서독간의 교통안전 비교)

  • 김홍상
    • Journal of Korean Society of Transportation
    • /
    • v.5 no.2
    • /
    • pp.55-72
    • /
    • 1987
  • The work analyzes the existing situation and defines special problems concerning traffic accidents in the two countries. The report is divided into three parts: 1) Using the global approach of SMEED, the data were evaluated using multiple regression analysis, and homogeneous groups of countries were defined by cluster analysis. In the global approach, the linear model is better than SMEED's non-linear model in explaining the number of fatalities. Among the different groups of countries, the linear approach was found to be better suited for industrialized countries and the non-linear approach better for the developing countries. T도 comparison of traffic fatality data for the Federal Republic the developing countries. The comparison of traffic fatality data for the Federal Republic of Germany and the Republic of Korea showed different regression equations during the same time period. 2) The BOX/JENKINS time series analysis on a monthly basis points out clearly similar seasonal patterns for the two countries over the years studied. The decrease in traffic accidents following the intensification of the safety belt requirement was proved in the ARIMA model. It amounts to 7 to 8 percent fewer personal injury accidents and fatal accidents. The identified increase in safety in the Federal Republic of Germany since the 1970s is mainly due to the reduction of accident severity in residential areas. 3) Speeds and headways on motorways in th3e two countries were also compared. The measurements point out that German road users drive faster, take more risks, and accept shorter time gaps than Korean road users. However, the accident statistics show accident rates for Korea that are several times higher than those in the Federal Republic of Germany.

  • PDF

Concentration variability of atmospheric radon and gaseous pollutants at background area of Korea between 2017 and 2018

  • Kim, Won-Hyung;Yang, Hyo-Sun;Bu, Jun-Oh;Kang, Chang-Hee;Song, Jung-Min;Chambers, S.
    • Analytical Science and Technology
    • /
    • v.35 no.1
    • /
    • pp.32-40
    • /
    • 2022
  • The concentrations of radon in the atmosphere were measured at the Gosan site of Jeju Island during 2017-2018, in order to investigate the time-series variation characteristics and the dependency of airflow transport pathways. The mean 222Rn concentration was 2,480 mBq m-3, and its monthly concentration in November was 3,262 mBq m-3, more than twice as that in July (1,459 mBq m-3). The diurnal radon concentrations increased throughout the nighttime to the maximum (2,862 mBq m-3) at around 7 a.m., then gradually decreased throughout the daytime by the minimum (1,997 mBq m-3) at around 3 p.m. The seasonal and monthly variations of CO, NO2, O3 showed a roughly similar pattern to that of radon for the same period, as high in winter and low in summer. The cluster back trajectory analysis described that about 60 % of overall airflow pathways was influenced by the airflow from China. The concentrations of radon and gaseous pollutants were relatively high as the airflow was influenced by China continent, but comparatively much lower as influenced by the northern Pacific Ocean.

A study on Digital Agriculture Data Curation Service Plan for Digital Agriculture

  • Lee, Hyunjo;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.171-177
    • /
    • 2022
  • In this paper, we propose a service method that can provide insight into multi-source agricultural data, way to cluster environmental factor which supports data analysis according to time flow, and curate crop environmental factors. The proposed curation service consists of four steps: collection, preprocessing, storage, and analysis. First, in the collection step, the service system collects and organizes multi-source agricultural data by using an OpenAPI-based web crawler. Second, in the preprocessing step, the system performs data smoothing to reduce the data measurement errors. Here, we adopt the smoothing method for each type of facility in consideration of the error rate according to facility characteristics such as greenhouses and open fields. Third, in the storage step, an agricultural data integration schema and Hadoop HDFS-based storage structure are proposed for large-scale agricultural data. Finally, in the analysis step, the service system performs DTW-based time series classification in consideration of the characteristics of agricultural digital data. Through the DTW-based classification, the accuracy of prediction results is improved by reflecting the characteristics of time series data without any loss. As a future work, we plan to implement the proposed service method and apply it to the smart farm greenhouse for testing and verification.

Statistical Properties of Flare Variability, Energy, and Frequency in Low-Mass Stars

  • Chang, Seo-Won;Byun, Yong-Ik
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.36 no.1
    • /
    • pp.29.2-29.2
    • /
    • 2011
  • Although stellar flares have a long history of observations, there are few concrete understanding about underlying physical processes and meaningful correlations with other stellar properties. Most of previous observations dealt with only a small number of sample stars, and therefore not sufficient to support generalized statistical studies. Based on one-month long MMT time-series observations of the open cluster M37, we monitored light variations of nearly 2,500 M-dwarf stars and successfully identified 606 flare events from 422 stars. This is a rare attempt to estimate true flare rates and properties among many stars of the same age and mass group. For each flare, we considered both observational and physical parameters including flare shape, duration before and after the peak, baseline magnitude before and after the peak, peak magnitudes, total energy and peak energy, etc. We find significant correlations between some of key parameters over a wide range of energy ($Er=10^{32}{\sim}10^{36}ergs$). For instance, regardless of stellar luminosities, the energy power spectrum of flares can be approximated by a power law (${\beta}=0.83-0.97$). This suggests that flares follow similar physical mechanisms for atmospheric heating and cooling among these low-mass stars. From this MMT data set, we derived an average flaring rate of $0.019 hr^{-1}$ among flare stars and $0.003 hr^{-1}$ for all M-dwarf candidates. We will report the details of our analysis and discuss physical implications.

  • PDF

Analysis of the differences in living population changes and regional responses by COVID-19 outbreak in Seoul (코로나-19에 따른 서울시 생활인구 변화와 동별 반응 차이 분석)

  • Jin, Juhae;Seong, Byeongchan
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.697-712
    • /
    • 2020
  • New infectious diseases have broken out repeatedly across the world over the last 20 years; COVID-19 is causing drastic changes and damage to daily lives. Furthermore, as there is no denying that new epidemics will appear in the future, there is a continuous need to develop measures aimed towards responding to economic damage. Against this backdrop, the living population is an important indicator that shows changes in citizens' life patterns. This study analyzes time-based and socio-environmental characteristics by detecting and classifying changes in everyday life caused by COVID-19 from the perspective of the floating population. k-shape Clustering is used to classify living population data of each of the 424 dong's in Seoul measured by the hour; then by applying intervention analysis and One-way ANOVA, each cluster's characteristics and aspects of change in the living population occurring in the aftermath of COVID-19 are scrutinized. In conclusion, this study confirms each cluster's obvious characteristics in changes of population flows before and after the confirmation of coronavirus patients and distinguishes groups that reacted sensitively to the intervention times on the basis of COVID-related incidents from those that did not.

Application of the Poisson Cluster Rainfall Generation Model to the Urban Flood Analysis (포아송 클러스터 강우 생성 모형을 이용한 도시 홍수 해석)

  • Park, Hyunjin;Yang, Jungsuk;Han, Jaemoon;Kim, Dongkyun
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.9
    • /
    • pp.729-741
    • /
    • 2015
  • This study examined the applicability of MBLRP (Modified Bartlett-Lewis Rectangular Pulse) rainfall generation model for an urban flood simulation which is a type of Poisson cluster rainfall generation model. This study constructed XP-SWMM model for Namgajwa area of Hongjecheon basin, which is a two-dimensional pipe network-surface flood simulation program and computed a flood discharge and a flooded area with input data of synthetic rainfall time series of 200 years that were generated by the MBLRP model. This study compared the data of flood with synthetic rainfall and flood with corresponding values which were based on design rainfall. The results showed that the flooded area computed with MBLRP model was somewhat smaller than the corresponding values on the basis of the design. A degree of underestimation was from 8% (5 year) to 34% (200 year) and the degree of underestimation increased as a return period increased. This study is meaningful in that it proposes methodology that enables quantifiability of uncertain variables which are related to a flooding through Monte Carlo analysis of urban flooding simulation and applicability and limitations thereof.

A Study on Korean Local Governments' Operation of Participatory Budgeting System : Classification by Support Vector Machine Technique (한국 지방자치단체의 주민참여예산제도 운영에 관한 연구 - Support Vector Machine 기법을 이용한 유형 구분)

  • Junhyun Han;Jaemin Ryou;Jayon Bae;Chunghyeok Im
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.461-466
    • /
    • 2024
  • Korean local governments operates the participatory budgeting system autonomously. This study is to classify these entities into clusters. Among the diverse machine learning methodologies(Neural Network, Rule Induction(CN2), KNN, Decision Tree, Random Forest, Gradient Boosting, SVM, Naïve Bayes), the Support Vector Machine technique emerged as the most efficacious in the analysis of 2022 Korean municipalities data. The first cluster C1 is characterized by minimal committee activity but a substantial allocation of participatory budgeting; another cluster C3 comprises cities that exhibit a passive stance. The majority of cities falls into the final cluster C2 which is noted for its proactive engagement in. Overall, most Korean local government operates the participatory busgeting system in good shape. Only a small number of cities is less active in this system. We anticipate that analyzing time-series data from the past decade in follow-up studies will further enhance the reliability of classifying local government types regarding participatory budgeting.