• Title/Summary/Keyword: Agglomerative

Search Result 68, Processing Time 0.028 seconds

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

Optimal Fuzzy Models with the Aid of SAHN-based Algorithm

  • Lee Jong-Seok;Jang Kyung-Won;Ahn Tae-Chon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.2
    • /
    • pp.138-143
    • /
    • 2006
  • In this paper, we have presented a Sequential Agglomerative Hierarchical Nested (SAHN) algorithm-based data clustering method in fuzzy inference system to achieve optimal performance of fuzzy model. SAHN-based algorithm is used to give possible range of number of clusters with cluster centers for the system identification. The axes of membership functions of this fuzzy model are optimized by using cluster centers obtained from clustering method and the consequence parameters of the fuzzy model are identified by standard least square method. Finally, in this paper, we have observed our model's output performance using the Box and Jenkins's gas furnace data and Sugeno's non-linear process data.

Clustering of Stereo Matching Data for Vehicle Segmentation (차량분리를 위한 스테레오매칭 데이터의 클러스터링)

  • Lee, Ki-Yong;Lee, Joon-Woong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.8
    • /
    • pp.744-750
    • /
    • 2010
  • To segment instances of vehicle classes in a sparse stereo-matching data set, this paper presents an algorithm for clustering based on DP (Dynamic Programming). The algorithm is agglomerative: it begins with each element in the set as a separate cluster and merges them into successively larger clusters according to similarity of two clusters. Here, similarity is formulated as a cost function of DP. The proposed algorithm is proven to be effective by experiments performed on various images acquired by a moving vehicle.

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

A Comparative Study on the Agglomerative and Divisive Methods for Hierarchical Document Clustering (계층적 문서 클러스터링을 위한 응집식 기법과 분할식 기법의 비교 연구)

  • Lee, Jae-Yun;Jeong, Jin-Ah
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2005.08a
    • /
    • pp.65-70
    • /
    • 2005
  • 계층적 문서 클러스터링에 있어서 실험집단에 따라 응집식 기법과 분할식 기법의 성능이 다르며, 이를 좌우하는 요소는 분류의 깊이, 즉 분류수준이라고 가정하였다. 조금만 나누면 되는 대분류인 경우는 상대적으로 분할식 기법이 유리하고, 조금만 합치면 되는 소분류인 경우에는 응집식 기법이 유리할 것이라고 판단했기 때문이다. 그에 따라 분할식 클러스터링 기법인 양분(Bisecting) K-means기법과 응집식 기법인 완전연결, 평균연결, WARD기법의 성능을 실험집단이 대분류인 경우와 소분류인 경우의 유사계수를 적용하여 각 기법별 성능을 비교하여 실험집단의 특성에 따른 적합 클러스터링 기법을 찾고자 하였다. 실험결과 응집식 기법과 분할식 기법의 성능 우열에 영향을 미치는 것은 분류수준보다는 변이계수로 측정된 상대적인 군집의 크기 편차인 것으로 나타났다.

  • PDF

Unsupervised Image Classification using Region-growing Segmentation based on CN-chain

  • Lee, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.20 no.3
    • /
    • pp.215-225
    • /
    • 2004
  • A multistage hierarchical clustering technique, which is an unsupervised technique, was suggested in this paper for classifying large remotely-sensed imagery. The multistage algorithm consists of two stages. The 'local' segmentor of the first stage performs region-growing segmentation by employing the hierarchical clustering procedure of CN-chain with the restriction that pixels in a cluster must be spatially contiguous. The 'global' segmentor of the second stage, which has not spatial constraints for merging, clusters the segments resulting from the previous stage, using the conventional agglomerative approach. Using simulation data, the proposed method was compared with another hierarchical clustering technique based on 'mutual closest neighbor.' The experimental results show that the new approach proposed in this study considerably increases in computational efficiency for larger images with a low number of bands. The technique was then applied to classify the land-cover types using the remotely-sensed data acquired from the Korean peninsula.

Underdetermined blind source separation using normalized spatial covariance matrix and multichannel nonnegative matrix factorization (멀티채널 비음수 행렬분해와 정규화된 공간 공분산 행렬을 이용한 미결정 블라인드 소스 분리)

  • Oh, Son-Mook;Kim, Jung-Han
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.2
    • /
    • pp.120-130
    • /
    • 2020
  • This paper solves the problem in underdetermined convolutive mixture by improving the disadvantages of the multichannel nonnegative matrix factorization technique widely used in blind source separation. In conventional researches based on Spatial Covariance Matrix (SCM), each element composed of values such as power gain of single channel and correlation tends to degrade the quality of the separated sources due to high variance. In this paper, level and frequency normalization is performed to effectively cluster the estimated sources. Therefore, we propose a novel SCM and an effective distance function for cluster pairs. In this paper, the proposed SCM is used for the initialization of the spatial model and used for hierarchical agglomerative clustering in the bottom-up approach. The proposed algorithm was experimented using the 'Signal Separation Evaluation Campaign 2008 development dataset'. As a result, the improvement in most of the performance indicators was confirmed by utilizing the 'Blind Source Separation Eval toolbox', an objective source separation quality verification tool, and especially the performance superiority of the typical SDR of 1 dB to 3.5 dB was verified.

Detection of M:N corresponding class group pairs between two spatial datasets with agglomerative hierarchical clustering (응집 계층 군집화 기법을 이용한 이종 공간정보의 M:N 대응 클래스 군집 쌍 탐색)

  • Huh, Yong;Kim, Jung-Ok;Yu, Ki-Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.2
    • /
    • pp.125-134
    • /
    • 2012
  • In this paper, we propose a method to analyze M:N corresponding relations in semantic matching, especially focusing on feature class matching. Similarities between any class pairs are measured by spatial objects which coexist in the class pairs, and corresponding classes are obtained by clustering with these pairwise similarities. We applied a graph embedding method, which constructs a global configuration of each class in a low-dimensional Euclidean space while preserving the above pairwise similarities, so that the distances between the embedded classes are proportional to the overall degree of similarity on the edge paths in the graph. Thus, the clustering problem could be solved by employing a general clustering algorithm with the embedded coordinates. We applied the proposed method to polygon object layers in a topographic map and land parcel categories in a cadastral map of Suwon area and evaluated the results. F-measures of the detected class pairs were analyzed to validate the results. And some class pairs which would not detected by analysis on nominal class names were detected by the proposed method.

Research on the Development of Distance Metrics for the Clustering of Vessel Trajectories in Korean Coastal Waters (국내 연안 해역 선박 항적 군집화를 위한 항적 간 거리 척도 개발 연구)

  • Seungju Lee;Wonhee Lee;Ji Hong Min;Deuk Jae Cho;Hyunwoo Park
    • Journal of Navigation and Port Research
    • /
    • v.47 no.6
    • /
    • pp.367-375
    • /
    • 2023
  • This study developed a new distance metric for vessel trajectories, applicable to marine traffic control services in the Korean coastal waters. The proposed metric is designed through the weighted summation of the traditional Hausdorff distance, which measures the similarity between spatiotemporal data and incorporates the differences in the average Speed Over Ground (SOG) and the variance in Course Over Ground (COG) between two trajectories. To validate the effectiveness of this new metric, a comparative analysis was conducted using the actual Automatic Identification System (AIS) trajectory data, in conjunction with an agglomerative clustering algorithm. Data visualizations were used to confirm that the results of trajectory clustering, with the new metric, reflect geographical distances and the distribution of vessel behavioral characteristics more accurately, than conventional metrics such as the Hausdorff distance and Dynamic Time Warping distance. Quantitatively, based on the Davies-Bouldin index, the clustering results were found to be superior or comparable and demonstrated exceptional efficiency in computational distance calculation.

Hierarchical Clustering Methodology for Source Code Plagiarism Detection (계층적 군집화 기법을 이용한 소스 코드 표절 검사)

  • Sohn, Ki-Rack;Moon, Seung-Mi
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.91-98
    • /
    • 2007
  • Plagiarism is a serious problem in school education due to current technologies such as the internet and word processors. This paper presents how to detect source code plagiarism using similarity based on string comparison methods. The main contribution is to use hierarchical agglomerative clustering technique to classify plagiarism groups, which are then visualized as a dendrogram. Graders can set an empirical threshold to the dendrogram to navigate plagiarism groups. We evaluated the performance of the presented method with a real world data. The result showed the usefulness and applicability of this method.

  • PDF