• Title/Summary/Keyword: Support Vector Clustering

Search Result 89, Processing Time 0.029 seconds

Support Vector Data Description using Mean Shift Clustering (평균 이동 알고리즘 기반의 지지 벡터 영역 표현 방법)

  • Chang, Hyung-Jin;Kim, Pyo-Jae;Choi, Jung-Hwan;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.307-309
    • /
    • 2007
  • SVDD의 scale prob1em을 해결하기 위하여, 학습 데이터를 sub-groupings하여 group 단위로 SVDD를 통해 학습함으로써 학습 시간을 줄이는, K-means clustering을 이용한 SVDD 방범(KMSVDD)이 제안되었다. 하지만 KMSVDD는 K-means clustering 알고리즘의 본질상 최적의 K값을 정하기 힘들다는 문제와, 동일한 데이터를 학습할지라도 clustered group이 램덤하게 형성되기 때문에 매번 학습의 결과가 달라지는 문제점이 있었다. 또한 데이터의 분포 상태와 관계없이 무조건 타원(dlliptic) 형태의 K개의 cluster로 나누기 때문에 각각의 나눠진 cluster들은 데이터 분포에 대한 특징을 나타내기 힘들게 된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 데이터 분포에서 mode를 먼저 찾은 후 이 mode를 기준으로 clustering하는 Mean Shift clustering 방법을 이용한 SVDD를 제안하고자 한다. 제안된 알고리즘은 KMSVDD와 비교해 데이터 학습 속도에서는 큰 차이가 없으면서도 데이터의 분포 상태를 고려한 형태로 clustering 한 sub-group을 학습하므로 학습의 정확도가 일정하게 되며, 각각의 cluster는 데이터 분표의 특징을 포함하는 효과가 있다. 또한 Mean Shift Kernel의 bandwidth의 결정은 K-Means의 K와는 달리 어느 정도 여유를 갖고 결정되어도 학습 결과에는 차이가 없다. 다양한 데이터들을 이용한 모의실험을 통하여 위의 내용들을 검증하도록 한다.

  • PDF

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Adaptive Intrusion Detection System Based on SVM and Clustering (SVM과 클러스터링 기반 적응형 침입탐지 시스템)

  • Lee, Han-Sung;Im, Young-Hee;Park, Joo-Young;Park, Dai-Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.2
    • /
    • pp.237-242
    • /
    • 2003
  • In this paper, we propose a new adaptive intrusion detection algorithm based on clustering: Kernel-ART, which is composed of the on-line clustering algorithm, ART (adaptive resonance theory), combining with mercer-kernel and concept vector. Kernel-ART is not only satisfying all desirable characteristics in the context of clustering-based IDS but also alleviating drawbacks associated with the supervised learning IDS. It is able to detect various types of intrusions in real-time by means of generating clusters incrementally.

Estimation of residual stress in welding of dissimilar metals at nuclear power plants using cascaded support vector regression

  • Koo, Young Do;Yoo, Kwae Hwan;Na, Man Gyun
    • Nuclear Engineering and Technology
    • /
    • v.49 no.4
    • /
    • pp.817-824
    • /
    • 2017
  • Residual stress is a critical element in determining the integrity of parts and the lifetime of welded structures. It is necessary to estimate the residual stress of a welding zone because residual stress is a major reason for the generation of primary water stress corrosion cracking in nuclear power plants. That is, it is necessary to estimate the distribution of the residual stress in welding of dissimilar metals under manifold welding conditions. In this study, a cascaded support vector regression (CSVR) model was presented to estimate the residual stress of a welding zone. The CSVR model was serially and consecutively structured in terms of SVR modules. Using numerical data obtained from finite element analysis by a subtractive clustering method, learning data that explained the characteristic behavior of the residual stress of a welding zone were selected to optimize the proposed model. The results suggest that the CSVR model yielded a better estimation performance when compared with a classic SVR model.

Creating Level Set Trees Using One-Class Support Vector Machines (One-Class 서포트 벡터 머신을 이용한 레벨 셋 트리 생성)

  • Lee, Gyemin
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.86-92
    • /
    • 2015
  • A level set tree provides a useful representation of a multidimensional density function. Visualizing the data structure as a tree offers many advantages for data analysis and clustering. In this paper, we present a level set tree estimation algorithm for use with a set of data points. The proposed algorithm creates a level set tree from a family of level sets estimated over a whole range of levels from zero to infinity. Instead of estimating density function then thresholding, we directly estimate the density level sets using one-class support vector machines (OC-SVMs). The level set estimation is facilitated by the OC-SVM solution path algorithm. We demonstrate the proposed level set tree algorithm on benchmark data sets.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Fuzzy c-Regression Using Weighted LS-SVM

  • Hwang, Chang-Ha
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.161-169
    • /
    • 2005
  • In this paper we propose a fuzzy c-regression model based on weighted least squares support vector machine(LS-SVM), which can be used to detect outliers in the switching regression model while preserving simultaneous yielding the estimates of outputs together with a fuzzy c-partitions of data. It can be applied to the nonlinear regression which does not have an explicit form of the regression function. We illustrate the new algorithm with examples which indicate how it can be used to detect outliers and fit the mixed data to the nonlinear regression models.

  • PDF

Switching Regression Analysis via Fuzzy LS-SVM

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.609-617
    • /
    • 2006
  • A new fuzzy c-regression algorithm for switching regression analysis is presented, which combines fuzzy c-means clustering and least squares support vector machine. This algorithm can detect outliers in switching regression models while yielding the simultaneous estimates of the associated parameters together with a fuzzy c-partitions of data. It can be employed for the model-free nonlinear regression which does not assume the underlying form of the regression function. We illustrate the new approach with some numerical examples that show how it can be used to fit switching regression models to almost all types of mixed data.

  • PDF

Multi-Radial Basis Function SVM Classifier: Design and Analysis

  • Wang, Zheng;Yang, Cheng;Oh, Sung-Kwun;Fu, Zunwei
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.6
    • /
    • pp.2511-2520
    • /
    • 2018
  • In this study, Multi-Radial Basis Function Support Vector Machine (Multi-RBF SVM) classifier is introduced based on a composite kernel function. In the proposed multi-RBF support vector machine classifier, the input space is divided into several local subsets considered for extremely nonlinear classification tasks. Each local subset is expressed as nonlinear classification subspace and mapped into feature space by using kernel function. The composite kernel function employs the dual RBF structure. By capturing the nonlinear distribution knowledge of local subsets, the training data is mapped into higher feature space, then Multi-SVM classifier is realized by using the composite kernel function through optimization procedure similar to conventional SVM classifier. The original training data set is partitioned by using some unsupervised learning methods such as clustering methods. In this study, three types of clustering method are considered such as Affinity propagation (AP), Hard C-Mean (HCM) and Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA). Experimental results on benchmark machine learning datasets show that the proposed method improves the classification performance efficiently.