• Title/Summary/Keyword: 클러스터링 앙상블

Search Result 10, Processing Time 0.024 seconds

Data Fusion, Ensemble and Clustering for the Severity Classification of Road Traffic Accident in Korea (데이터융합, 앙상블과 클러스터링을 이용한 교통사고 심각도 분류분석)

  • 손소영;이성호
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.597-600
    • /
    • 2000
  • 계속적인 증가 추세를 보이고 있는 교통량으로 인해 환경 문제뿐 아니라 교통사고로 인한 사상자 및 물적피해가 상당량으로 집계되고 있다. 본 논문에서는 데이터융합 및 앙상블 클러스터링방법을 이용한 교통사고 심각도 분류분석방법을 제안함으로서 교통사고예방에 기여하고자 한다. 이를 위하여 신경망과 Decision-Tree기법을 이용하여 얻은 물적피해와 신체상해가 발생할 확률을 융합하는 전형적인 데이터 융합기법(템스터-쉐퍼, 베이지안 방법, 로지스틱융합방법)을 사용하였다. 또한, 분류정확도를 향상시키고자 Bootstrap 재추출 방법을 이용해 얻어진 여러 개의 분류예측 결과 중 다수의 분류결과를 선택하는 앙상블 (arcing, bagging)기법을 적용하였다. 더불어, 본 연구에서는 클러스터링 방법을 제시하고, 이 방법이 기존의 융합기법, 앙상블기법과 비교한 결과, 분류예측면에서 정확도가 향상됨을 보였다.

  • PDF

Association-rule based ensemble clustering for adopting a prior knowledge (사전정보 활용을 위한 관련 규칙 기반의 Ensemble 클러스터링)

  • Go, Song;Kim, Dae-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.67-70
    • /
    • 2007
  • 본 논문은 클러스터링 문제에서 사전 정보에 대한 활용의 효율을 개선시킬 수 있는 방법을 제안한다. 클러스터링에서 사전 정보의 존재 시 이의 활용은 성능을 개선시킬 수 있는 계기가 될 수 있으므로 그의 활용 폭을 늘리기 위한 방법으로 다양한 사용 방법의 적용인 semi-supervised 클러스터링 앙상블을 제안한다. 사전 정보의 활용 방법의 방안으로써 association-rule의 개념을 접목하였다. 클러스터 수를 다르게 적용하더라도 패턴간의 유사도가 높으면 같은 그룹에 속할 확률은 높아진다. 다양한 초기화에 따른 클러스터의 동작은 사전 정보의 활용을 다양화 시키게 되며, 사전 정보에 충족하는 각각의 클러스터 결과를 제시한다. 결과를 총 취합하여 association-matrix를 형성하면 패턴간의 유사도를 얻을 수 있으며 결국 association-matrix를 통해 클러스터링 할 수 있는 방법을 제시한다.

  • PDF

An Ensemble Clustering Algorithm based on a Prior Knowledge (사전정보를 활용한 앙상블 클러스터링 알고리즘)

  • Ko, Song;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.2
    • /
    • pp.109-121
    • /
    • 2009
  • Although a prior knowledge is a factor to improve the clustering performance, it is dependant on how to use of them. Especial1y, when the prior knowledge is employed in constructing initial centroids of cluster groups, there should be concerned of similarities of a prior knowledge. Despite labels of some objects of a prior knowledge are identical, the objects whose similarities are low should be separated. By separating them, centroids of initial group were not fallen in a problem which is collision of objects with low similarities. There can use the separated prior knowledge by various methods such as various initializations. To apply association rule, proposed method makes enough cluster group number, then the centroids of initial groups could constructed by separated prior knowledge. Then ensemble of the various results outperforms what can not be separated.

Context-Aware Fusion with Support Vector Machine (Support Vector Machine을 이용한 문맥 인지형 융합)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.6
    • /
    • pp.19-26
    • /
    • 2014
  • An ensemble classifier system is a widely-used multi-classifier system, which combines the results from each classifier and, as a result, achieves better classification result than any single classifier used. Several methods have been used to build an ensemble classifier including boosting, which is a cascade method where misclassified examples in previous stage are used to boost the performance in current stage. Boosting is, however, a serial method which does not form a complete feedback loop. In this paper, proposed is context sensitive SVM ensemble (CASE) which adopts SVM, one of the best classifiers in term of classification rate, as a basic classifier and clustering method to divide feature space into contexts. As CASE divides feature space and trains SVMs simultaneously, the result from one component can be applied to the other and CASE achieves better result than boosting. Experimental results prove the usefulness of the proposed method.

Heterogeneous Clustering Ensemble Method using Evolutionary Approach with Different Cluster Results (다양한 클러스터 결과에 의해 진화적 접근법을 사용하는 이종 클러스터링 앙상블 기법)

  • Yoon Hye-Sung;Ahn Sun-Young;Lee Sang-Ho;Cho Sung-Bum;Kim Ju-Han
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.16-18
    • /
    • 2006
  • 데이터마이닝 기법의 클러스터링 알고리즘은 생물정보학에서 데이터 셋의 사전 정보를 고려하지 않고 중요한 유전적, 생물학적 상호작용을 찾기 위하여 적용되고 있다. 그러나 다양한 형식의 수많은 알고리즘들은 바이오데이터의 다양한 특성들과 실험의 가정 때문에 다른 클러스터링 결과들을 만들 수 있다. 본 논문에서는 바이오 데이터 셋의 특성에도 적합하면서 양질의 클러스터링 결과를 만들기 위한 새로운 방법을 제안한다. 이 방법은 여러 가지 클러스터링 알고리즘의 결과들을 유전자 알고리즘의 기본 개념인 진화적 환경에서 가장 적합한 형질을 선택하는 문제와 결합하였다. 그리고 실제 데이터 셋을 이용하여 우리의 제안하는 방법을 증명하고 실험 결과로 최적의 클러스터 결과를 보인다.

  • PDF

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

Defect Severity-based Ensemble Model using FCM (FCM을 적용한 결함심각도 기반 앙상블 모델)

  • Lee, Na-Young;Kwon, Ki-Tae
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.12
    • /
    • pp.681-686
    • /
    • 2016
  • Software defect prediction is an important factor in efficient project management and success. The severity of the defect usually determines the degree to which the project is affected. However, existing studies focus only on the presence or absence of a defect and not the severity of defect. In this study, we proposed an ensemble model using FCM based on defect severity. The severity of the defect of NASA data set's PC4 was reclassified. To select the input column that affected the severity of the defect, we extracted the important defect factor of the data set using Random Forest (RF). We evaluated the performance of the model by changing the parameters in the 10-fold cross-validation. The evaluation results were as follows. First, defect severities were reclassified from 58, 40, 80 to 30, 20, 128. Second, BRANCH_COUNT was an important input column for the degree of severity in terms of accuracy and node impurities. Third, smaller tree number led to more variables for good performance.

Improving the Performance of Deep-Learning-Based Ground-Penetrating Radar Cavity Detection Model using Data Augmentation and Ensemble Techniques (데이터 증강 및 앙상블 기법을 이용한 딥러닝 기반 GPR 공동 탐지 모델 성능 향상 연구)

  • Yonguk Choi;Sangjin Seo;Hangilro Jang;Daeung Yoon
    • Geophysics and Geophysical Exploration
    • /
    • v.26 no.4
    • /
    • pp.211-228
    • /
    • 2023
  • Ground-penetrating radar (GPR) surveys are commonly used to monitor embankments, which is a nondestructive geophysical method. The results of GPR surveys can be complex, depending on the situation, and data processing and interpretation are subject to expert experiences, potentially resulting in false detection. Additionally, this process is time-intensive. Consequently, various studies have been undertaken to detect cavities in GPR survey data using deep learning methods. Deep-learning-based approaches require abundant data for training, but GPR field survey data are often scarce due to cost and other factors constaining field studies. Therefore, in this study, a deep- learning-based model was developed for embankment GPR survey cavity detection using data augmentation strategies. A dataset was constructed by collecting survey data over several years from the same embankment. A you look only once (YOLO) model, commonly used in computer vision for object detection, was employed for this purpose. By comparing and analyzing various strategies, the optimal data augmentation approach was determined. After initial model development, a stepwise process was employed, including box clustering, transfer learning, self-ensemble, and model ensemble techniques, to enhance the final model performance. The model performance was evaluated, with the results demonstrating its effectiveness in detecting cavities in embankment GPR survey data.

Data Fusion, Ensemble and Clustering for the Severity Classification of Road Traffic Accident in Korea (데이터융합, 앙상블과 클러스터링을 이용한 교통사고 심각도 분류분석)

  • Sohn, So-Young;Lee, Sung-Ho
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.26 no.4
    • /
    • pp.354-362
    • /
    • 2000
  • Increasing amount of road tragic in 90's has drawn much attention in Korea due to its influence on safety problems. Various types of data analyses are done in order to analyze the relationship between the severity of road traffic accident and driving conditions based on traffic accident records. Accurate results of such accident data analysis can provide crucial information for road accident prevention policy. In this paper, we apply several data fusion, ensemble and clustering algorithms in an effort to increase the accuracy of individual classifiers for the accident severity. An empirical study results indicated that clustering works best for road traffic accident classification in Korea.

  • PDF

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees (가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식)

  • Hong, June-Hyeok;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.