• Title/Summary/Keyword: time series clustering

Search Result 185, Processing Time 0.021 seconds

Identification of Fuzzy Inference Systems Using a Multi-objective Space Search Algorithm and Information Granulation

  • Huang, Wei;Oh, Sung-Kwun;Ding, Lixin;Kim, Hyun-Ki;Joo, Su-Chong
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.6
    • /
    • pp.853-866
    • /
    • 2011
  • We propose a multi-objective space search algorithm (MSSA) and introduce the identification of fuzzy inference systems based on the MSSA and information granulation (IG). The MSSA is a multi-objective optimization algorithm whose search method is associated with the analysis of the solution space. The multi-objective mechanism of MSSA is realized using a non-dominated sorting-based multi-objective strategy. In the identification of the fuzzy inference system, the MSSA is exploited to carry out parametric optimization of the fuzzy model and to achieve its structural optimization. The granulation of information is attained using the C-Means clustering algorithm. The overall optimization of fuzzy inference systems comes in the form of two identification mechanisms: structure identification (such as the number of input variables to be used, a specific subset of input variables, the number of membership functions, and the polynomial type) and parameter identification (viz. the apexes of membership function). The structure identification is developed by the MSSA and C-Means, whereas the parameter identification is realized via the MSSA and least squares method. The evaluation of the performance of the proposed model was conducted using three representative numerical examples such as gas furnace, NOx emission process data, and Mackey-Glass time series. The proposed model was also compared with the quality of some "conventional" fuzzy models encountered in the literature.

Health State Clustering and Prediction Based on Bayesian HMM (Bayesian HMM 기반의 건강 상태 분류 및 예측)

  • Sin, Bong-Kee
    • Journal of KIISE
    • /
    • v.44 no.10
    • /
    • pp.1026-1033
    • /
    • 2017
  • In this paper a Bayesian modeling and duration-based prediction method is proposed for health clinic time series data using the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM). HDP-HMM is a Bayesian extension of HMM which can find the optimal number of health states, a number which is highly uncertain and even difficult to estimate under the context of health dynamics. Test results of HDP-HMM using simulated data and real health clinic data have shown interesting modeling behaviors and promising prediction performance over the span of up to five years. The future of health change is uncertain and its prediction is inherently difficult, but experimental results on health clinic data suggests that practical long-term prediction is possible and can be made useful if we present multiple hypotheses given dynamic contexts as defined by HMM states.

An Alert Data Mining Framework for Intrusion Detection System (침입탐지시스템의 경보데이터 분석을 위한 데이터 마이닝 프레임워크)

  • Shin, Moon-Sun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.1
    • /
    • pp.459-466
    • /
    • 2011
  • In this paper, we proposed a data mining framework for the management of alerts in order to improve the performance of the intrusion detection systems. The proposed alert data mining framework performs alert correlation analysis by using mining tasks such as axis-based association rule, axis-based frequent episodes and order-based clustering. It also provides the capability of classify false alarms in order to reduce false alarms. We also analyzed the characteristics of the proposed system through the implementation and evaluation of the proposed system. The proposed alert data mining framework performs not only the alert correlation analysis but also the false alarm classification. The alert data mining framework can find out the unknown patterns of the alerts. It also can be applied to predict attacks in progress and to understand logical steps and strategies behind series of attacks using sequences of clusters and to classify false alerts from intrusion detection system. The final rules that were generated by alert data mining framework can be used to the real time response of the intrusion detection system.

Anomaly Detection in Sensor Data

  • Kim, Jong-Min;Baik, Jaiwook
    • Journal of Applied Reliability
    • /
    • v.18 no.1
    • /
    • pp.20-32
    • /
    • 2018
  • Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.

A Study on the Deduction of Social Issues Applying Word Embedding: With an Empasis on News Articles related to the Disables (단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로)

  • Choi, Garam;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.231-250
    • /
    • 2018
  • In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

Classification of Land Cover over the Korean Peninsula Using Polar Orbiting Meteorological Satellite Data (극궤도 기상위성 자료를 이용한 한반도의 지면피복 분류)

  • Suh, Myoung-Seok;Kwak, Chong-Heum;Kim, Hee-Soo;Kim, Maeng-Ki
    • Journal of the Korean earth science society
    • /
    • v.22 no.2
    • /
    • pp.138-146
    • /
    • 2001
  • The land cover over Korean peninsula was classified using a multi-temporal NOAA/AVHRR (Advanced Very High Resolution Radiometer) data. Four types of phenological data derived from the 10-day composited NDVI (Normalized Differences Vegetation Index), maximum and annual mean land surface temperature, and topographical data were used not only reducing the data volume but also increasing the accuracy of classification. Self organizing feature map (SOFM), a kind of neural network technique, was used for the clustering of satellite data. We used a decision tree for the classification of the clusters. When we compared the classification results with the time series of NDVI and some other available ground truth data, the urban, agricultural area, deciduous tree and evergreen tree were clearly classified.

  • PDF

A Study on Fuzzy Set-based Polynomial Neural Networks Based on Evolutionary Data Granulation (Evolutionary Data Granulation 기반으로한 퍼지 집합 다항식 뉴럴 네트워크에 관한 연구)

  • 노석범;안태천;오성권
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.10a
    • /
    • pp.433-436
    • /
    • 2004
  • In this paper, we introduce a new Fuzzy Polynomial Neural Networks (FPNNS)-like structure whose neuron is based on the Fuzzy Set-based Fuzzy Inference System (FS-FIS) and is different from that of FPNNS based on the Fuzzy relation-based Fuzzy Inference System (FR-FIS) and discuss the ability of the new FPNNS-like structure named Fuzzy Set-based Polynomial Neural Networks (FSPNN). The premise parts of their fuzzy rules are not identical, while the consequent parts of the both Networks (such as FPNN and FSPNN) are identical. This difference results from the angle of a viewpoint of partition of input space of system. In other word, from a point of view of FS-FIS, the input variables are mutually independent under input space of system, while from a viewpoint of FR-FIS they are related each other. The proposed design procedure for networks architecture involves the selection of appropriate nodes with specific local characteristics such as the number of input variables, the order of the polynomial that is constant, linear, quadratic, or modified quadratic functions being viewed as the consequent part of fuzzy rules, and a collection of the specific subset of input variables. On the parameter optimization phase, we adopt Information Granulation (IC) based on HCM clustering algorithm and a standard least square method-based learning. Through the consecutive process of such structural and parametric optimization, an optimized and flexible fuzzy neural network is generated in a dynamic fashion. To evaluate the performance of the genetically optimized FSPNN (gFSPNN), the model is experimented with using the time series dataset of gas furnace process.

  • PDF

Analysis of Pattern Change of Real Transaction Price of Apartment in Seoul (서울시 아파트 실거래가의 변화패턴 분석)

  • Kim, Jung Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.1
    • /
    • pp.63-70
    • /
    • 2014
  • This study is to analyze impact of geography and timing on the real transactions prices of apartment complexes in Seoul using data provided by the Ministry of Land, Infrastructure and Transport. The average real transactions and location data of apartment complex was combined into the GIS data. First, the pattern of apartment real transaction price change by period and by area was analyzed by kriging, the one of the spatial interpolation technique. Second, to analyze the pattern of apartment market price change by administrative district(administrative 'Dong' unit), the average of market price per unit area was calculated and converted to Moran I value, which was used to analyze the clustering level of the real transaction price. Through the analysis, spatial-temporal distribution pattern can be found and the type of change can be forecasted. Therefore, this study can be referred as of the base data research for the housing or local policies. Also, the regional unbalanced apartment price can be presented by analyzing the vertical pattern of the change in the time series and the horizontal pattern of the change based on GIS.

The Design of Polynomial RBF Neural Network by Means of Fuzzy Inference System and Its Optimization (퍼지추론 기반 다항식 RBF 뉴럴 네트워크의 설계 및 최적화)

  • Baek, Jin-Yeol;Park, Byaung-Jun;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.2
    • /
    • pp.399-406
    • /
    • 2009
  • In this study, Polynomial Radial Basis Function Neural Network(pRBFNN) based on Fuzzy Inference System is designed and its parameters such as learning rate, momentum coefficient, and distributed weight (width of RBF) are optimized by means of Particle Swarm Optimization. The proposed model can be expressed as three functional module that consists of condition part, conclusion part, and inference part in the viewpoint of fuzzy rule formed in 'If-then'. In the condition part of pRBFNN as a fuzzy rule, input space is partitioned by defining kernel functions (RBFs). Here, the structure of kernel functions, namely, RBF is generated from HCM clustering algorithm. We use Gaussian type and Inverse multiquadratic type as a RBF. Besides these types of RBF, Conic RBF is also proposed and used as a kernel function. Also, in order to reflect the characteristic of dataset when partitioning input space, we consider the width of RBF defined by standard deviation of dataset. In the conclusion part, the connection weights of pRBFNN are represented as a polynomial which is the extended structure of the general RBF neural network with constant as a connection weights. Finally, the output of model is decided by the fuzzy inference of the inference part of pRBFNN. In order to evaluate the proposed model, nonlinear function with 2 inputs, waster water dataset and gas furnace time series dataset are used and the results of pRBFNN are compared with some previous models. Approximation as well as generalization abilities are discussed with these results.

Hedging effectiveness of KOSPI200 index futures through VECM-CC-GARCH model (벡터오차수정모형과 다변량 GARCH 모형을 이용한 코스피200 선물의 헷지성과 분석)

  • Kwon, Dongan;Lee, Taewook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1449-1466
    • /
    • 2014
  • In this paper, we consider a hedge portfolio based on futures of underlying asset. A classical way to estimate a hedge ratio for a hedge portfolio of a spot and futures is a regression analysis. However, a regression analysis is not capable of reflecting long-run equilibrium between a spot and futures and volatility clustering in the conditional variance of financial time series. In order to overcome such defects, we analyzed KOSPI200 index and futures using VECM-CC-GARCH model and computed a hedge ratio from the estimated conditional covariance-variance matrix. In real data analysis, we compared a regression and VECM-CC-GARCH models in terms of hedge effectiveness based on variance, value at risk and expected shortfall of log-returns of hedge portfolio. The empirical results show that the multivariate GARCH models significantly outperform a regression analysis and improve hedging effectiveness in the period of high volatility.