• 제목/요약/키워드: Unsupervised clustering

검색결과 224건 처리시간 0.024초

RAG 기반 계층 분류 (2) (RAG-based Hierarchical Classification)

  • 이상훈
    • 대한원격탐사학회지
    • /
    • 제22권6호
    • /
    • pp.613-619
    • /
    • 2006
  • 본 연구는 원격 탐사의 영상 처리에서 영상 분할의 상위 수준으로 응집 계층 clustering의 dendrogram을 통한 무감독 영상 분류를 제안한다. 제안된 알고리즘은 분광 영역에서 정의된 RAG (Regional Agency Graph)와 min-heap 자료 구조를 이용하여 MCSNP (Mutual Closest Spectral Neighbor Pair)의 집합을 검색하면서 합병을 수행하는 계층 clustering 방법이다. 계산 시간과 저장 기억의 사용에 대한 효율을 증가시키기 위해 분광적 인접성을 정의하는 분광 공간(spectral space)내의 다중 창을 사용하였고 RNV (Region Neighbor Vector)을 이용하여 합병에 의하여 변하는 RAG 갱신하였고 적정한 단계 수가 주어진다면 제안된 알고리즘은 집단 합병의 계층적 관계를 쉽게 해석 할 수 있는 dendrogram을 생성한다. 본 연구는 simulation 자료를 사용하여 광범위하게 제안된 알고리즘에 대한 평가 실험을 수행 하였으며 실험 결과는 알고리즘의 효율성을 입증하였다. 또한 한반도에서 관측된 방대한 크기의 QuickBird 영상의 적용 결과는 제안된 알고리즘이 무감독 영상 분류를 위한 강력한 수단임을 보여준다.

미분류 데이터의 초기예측을 통한 군집기반의 부분지도 학습방법 (A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data)

  • 김응구;전치혁
    • 한국경영과학회지
    • /
    • 제33권3호
    • /
    • pp.93-105
    • /
    • 2008
  • Semi-supervised learning uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance, whereas unsupervised learning analyzes only unlabeled data for clustering purpose. We propose a new clustering-based semi-supervised learning method by reflecting the initial predicted labels of unlabeled data on the objective function. The initial prediction should be done in terms of a discrete probability distribution through a classification method using labeled data. As a result, clusters are formed and labels of unlabeled data are predicted according to the Information of labeled data in the same cluster. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.

Genomic Tree of Gene Contents Based on Functional Groups of KEGG Orthology

  • Kim Jin-Sik;Lee Sang-Yup
    • Journal of Microbiology and Biotechnology
    • /
    • 제16권5호
    • /
    • pp.748-756
    • /
    • 2006
  • We propose a genome-scale clustering approach to identify whole genome relationships using the functional groups given by the Kyoto Encyclopedia of Genes and Genomes Orthology (KO) database. The metabolic capabilities of each organism were defined by the number of genes in each functional category. The archaeal, bacterial, and eukaryotic genomes were compared by simultaneously applying a two-step clustering method, comprised of a self-organizing tree algorithm followed by unsupervised hierarchical clustering. The clustering results were consistent with various phenotypic characteristics of the organisms analyzed and, additionally, showed a different aspect of the relationship between genomes that have previously been established through rRNA-based comparisons. The proposed approach to collect and cluster the metabolic functional capabilities of organisms should make it a useful tool in predicting relationships among organisms.

STATISTICAL NOISE BAND REMOVAL FOR SURFACE CLUSTERING OF HYPERSPECTRAL DATA

  • Huan, Nguyen Van;Kim, Hak-Il
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.111-114
    • /
    • 2008
  • The existence of noise bands may deform the typical shape of the spectrum, making the accuracy of clustering degraded. This paper proposes a statistical approach to remove noise bands in hyperspectral data using the correlation coefficient of bands as an indicator. Considering each band as a random variable, two adjacent signal bands in hyperspectral data are highly correlative. On the contrary, existence of a noise band will produce a low correlation. For clustering, the unsupervised ${\kappa}$-nearest neighbor clustering method is implemented in accordance with three well-accepted spectral matching measures, namely ED, SAM and SID. Furthermore, this paper proposes a hierarchical scheme of combining those measures. Finally, a separability assessment based on the between-class and the within-class scatter matrices is followed to evaluate the applicability of the proposed noise band removal method. Also, the paper brings out a comparison for spectral matching measures.

  • PDF

Semidefinite Programming을 통한 그래프의 동시 분할법 (K-Way Graph Partitioning: A Semidefinite Programming Approach)

  • Jaehwan, Kim;Seungjin, Choi;Sung-Yang, Bang
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2004년도 가을 학술발표논문집 Vol.31 No.2 (1)
    • /
    • pp.697-699
    • /
    • 2004
  • Despite many successful spectral clustering algorithm (based on the spectral decomposition of Laplacian(1) or stochastic matrix(2) ) there are several unsolved problems. Most spectral clustering Problems are based on the normalized of algorithm(3) . are close to the classical graph paritioning problem which is NP-hard problem. To get good solution in polynomial time. it needs to establish its convex form by using relaxation. In this paper, we apply a novel optimization technique. semidefinite programming(SDP). to the unsupervised clustering Problem. and present a new multiple Partitioning method. Experimental results confirm that the Proposed method improves the clustering performance. especially in the Problem of being mixed with non-compact clusters compared to the previous multiple spectral clustering methods.

  • PDF

Improvement of SOM using Stratification

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제9권1호
    • /
    • pp.36-41
    • /
    • 2009
  • Self organizing map(SOM) is one of the unsupervised methods based on the competitive learning. Many clustering works have been performed using SOM. It has offered the data visualization according to its result. The visualized result has been used for decision process of descriptive data mining as exploratory data analysis. In this paper we propose improvement of SOM using stratified sampling of statistics. The stratification leads to improve the performance of SOM. To verify improvement of our study, we make comparative experiments using the data sets form UCI machine learning repository and simulation data.

디자인 패턴을 적용한 위성영상처리를 위한 군집화 분류시스템의 설계 (A Design of Clustering Classification Systems using Satellite Remote Sensing Images Based on Design Patterns)

  • 김동연;김진일
    • 정보처리학회논문지B
    • /
    • 제9B권3호
    • /
    • pp.319-326
    • /
    • 2002
  • 본 논문에서는 위성영상을 처리하기 위한 무감독분류 기법인 군집분류 시스템을 설계하고 구현하였다. 구현된 시스템은 새로운 위성영상 포맷과 군집분류 기법의 지원이 용이하고, 확장성 있는 시스템의 설계를 위하여 팩토리 패턴과 전략적 패턴 등 다양한 디자인 패턴을 적용하였다. 군집분류 시스템은 순차군집분류 기법, K-Means 군집분류 기법, ISODATA 기법, Fuzzy C-Means군집분류 기법을 설계, 구현하였으며 Landsat TM 위성영상을 분류기의 입력영상으로 실험하였다. 그 결과 군집분류 기법은 사전지식이 없는 위성영상의 분류를 위한 표본영역의 추출작업과 위성영상의 실시간 분류에 효과적인 사용이 가능함을 보였으며, 재사용성 및 확장성이 우수한 시스템을 개발하였다.

THE MODIFIED UNSUPERVISED SPECTRAL ANGLE CLASSIFICATION (MUSAC) OF HYPERION, HYPERION-FLASSH AND ETM+ DATA USING UNIT VECTOR

  • Kim, Dae-Sung;Kim, Yong-Il
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.134-137
    • /
    • 2005
  • Unsupervised spectral angle classification (USAC) is the algorithm that can extract ground object information with the minimum 'Spectral Angle' operation on behalf of 'Spectral Euclidian Distance' in the clustering process. In this study, our algorithm uses the unit vector instead of the spectral distance to compute the mean of cluster in the unsupervised classification. The proposed algorithm (MUSAC) is applied to the Hyperion and ETM+ data and the results are compared with K-Meails and former USAC algorithm (FUSAC). USAC is capable of clearly classifying water and dark forest area and produces more accurate results than K-Means. Atmospheric correction for more accurate results was adapted on the Hyperion data (Hyperion-FLAASH) but the results did not have any effect on the accuracy. Thus we anticipate that the 'Spectral Angle' can be one of the most accurate classifiers of not only multispectral images but also hyperspectral images. Furthermore the cluster unit vector can be an efficient technique for determination of each cluster mean in the USAC.

  • PDF

Labeling Big Spatial Data: A Case Study of New York Taxi Limousine Dataset

  • AlBatati, Fawaz;Alarabi, Louai
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.207-212
    • /
    • 2021
  • Clustering Unlabeled Spatial-datasets to convert them to Labeled Spatial-datasets is a challenging task specially for geographical information systems. In this research study we investigated the NYC Taxi Limousine Commission dataset and discover that all of the spatial-temporal trajectory are unlabeled Spatial-datasets, which is in this case it is not suitable for any data mining tasks, such as classification and regression. Therefore, it is necessary to convert unlabeled Spatial-datasets into labeled Spatial-datasets. In this research study we are going to use the Clustering Technique to do this task for all the Trajectory datasets. A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of labeled datasets. Labeling a Big-data in many cases is a costly process. In this paper, we show the effectiveness of utilizing a Clustering Technique for labeling spatial data that leads to a high-accuracy classifier.

Discovering Community Interests Approach to Topic Model with Time Factor and Clustering Methods

  • Ho, Thanh;Thanh, Tran Duy
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.163-177
    • /
    • 2021
  • Many methods of discovering social networking communities or clustering of features are based on the network structure or the content network. This paper proposes a community discovery method based on topic models using a time factor and an unsupervised clustering method. Online community discovery enables organizations and businesses to thoroughly understand the trend in users' interests in their products and services. In addition, an insight into customer experience on social networks is a tremendous competitive advantage in this era of ecommerce and Internet development. The objective of this work is to find clusters (communities) such that each cluster's nodes contain topics and individuals having similarities in the attribute space. In terms of social media analytics, the method seeks communities whose members have similar features. The method is experimented with and evaluated using a Vietnamese corpus of comments and messages collected on social networks and ecommerce sites in various sectors from 2016 to 2019. The experimental results demonstrate the effectiveness of the proposed method over other methods.