• Title/Summary/Keyword: Agglomerative

Search Result 68, Processing Time 0.036 seconds

Performance Comparison of Clustering Techniques for Spatio-Temporal Data (시공간 데이터를 위한 클러스터링 기법 성능 비교)

  • Kang Nayoung;Kang Juyoung;Yong Hwan-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.15-37
    • /
    • 2004
  • With the growth in the size of datasets, data mining has recently become an important research topic. Especially, interests about spatio-temporal data mining has been increased which is a method for analyzing massive spatio-temporal data collected from a wide variety of applications like GPS data, trajectory data of surveillance system and earth geographic data. In the former approaches, conventional clustering algorithms are applied as spatio-temporal data mining techniques without any modification. In this paper, we focused to SOM that is the most common clustering algorithm applied to clustering analysis in data mining wet and develop the spatio-temporal data mining module based on it. In addition, we analyzed the clustering results of developed SOM module and compare them with those of K-means and Agglomerative Hierarchical algorithm in the aspects of homogeneity, separation, separation, silhouette width and accuracy. We also developed specialized visualization module fur more accurate interpretation of mining result.

  • PDF

Highlight based Lyrics Search Considering the Characteristics of Query (사용자 질의어 특징을 반영한 하이라이트 기반 노래 가사 검색)

  • Kim, Kweon Yang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.4
    • /
    • pp.301-307
    • /
    • 2016
  • This paper proposes a lyric search method to consider the characteristics of the user query. According to the fact that queries for the lyric search are derived from highlight parts of the music, this paper uses the hierarchical agglomerative clustering to find the highlight and proposes a Gaussian weighting to consider the neighbor of the highlight as well as highlight. By setting the mean of a Gaussian weighting at the highlight, this weighting function has higher weights near the highlight and the lower weights far from the highlight. Then, this paper constructs a index of lyrics with the gaussian weighting. According to the experimental results on a data set obtained from 5 real users, the proposed method is proved to be effective.

Classification and Retrieval of Object - Oriented Reuse Components with HACM (HACM을 사용한 객체지향 재사용 부품의 분류와 검색)

  • Bae, Je-Min;Kim, Sang-Geun;Lee, Kyung-Whan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.7
    • /
    • pp.1733-1748
    • /
    • 1997
  • In this paper, we propose the classification scheme and retrieval mechanism which can apply to many application domains in order to construct the software reuse library. Classification scheme which is the core of the accessibility in the reusability, is defined by the hierarchical structure using the agglomerative clusters. Agglomerative cluster means the group of the reuse component by the functional relationships. Functional relationships are measured by the HACM which is the representation method about software components to calculate the similarities among the classes in the particular domain. And clustering informations are added to the library structure which determines the functionality and accuracy of the retrieval system. And the system stores the classification results such as the index information with the weights, the similarity matrix, the hierarchical structure. Therefore users can retrieve the software component using the query which is the natural language. The thesis is studied to focus on the findability of software components in the reuse library. As a result, the part of the construction process of the reuse library was automated, and we can construct the object-oriented reuse library with the extendibility and relationship about the reuse components. Also the our process is visualized through the browse hierarchy of the retrieval environment, and the retrieval system is integrated to the reuse system CARS 2.1.

  • PDF

Morphometric Characterisation of Root-Knot Nematode Populations from Three Regions in Ghana

  • Nyaku, Seloame Tatu;Lutuf, Hanif;Cornelius, Eric
    • The Plant Pathology Journal
    • /
    • v.34 no.6
    • /
    • pp.544-554
    • /
    • 2018
  • Tomato (Solanum lycopersicum) production in Ghana is limited by the root-knot nematode (Meloidogyne incognita, and yield losses over 70% have been experienced in farmer fields. Major management strategies of the root-knot nematode (RKN), such as rotation and nematicide application, and crop rotation are either little efficient and harmful to environments, with high control cost, respectively. Therefore, this study aims to examine morphometric variations of RKN populations in Ghana, using principal component analysis (PCA), of which the information can be utilized for the development of tomato cultivars resistant to RKN. Ninety (90) second-stage juveniles (J2) and 16 adult males of M. incognita were morphometrically characterized. Six and five morphometric variables were measured for adult males and second-stage juveniles (J2) respectively. Morphological measurements showed differences among the adult males and second-stage juveniles (J2). A plot of PC1 and PC2 for M. incognita male populations showed clustering into three main groups. Populations from Asuosu and Afrancho (Group I) were more closely related compared to populations from Tuobodom and Vea (Group II). There was however a single nematode from Afrancho (AF4) that fell into Group III. Biplots for male populations indicate, body length, DEGO, greatest body width, and gubernaculum length serving as variables distinguishing Group 1 and Group 2 populations. These same groupings from the PCA were reflected in the dendogram generated using Agglomerative Hierarchical Clustering (AHC). This study provides the first report on morphometric characterisation of M. incognita male and juvenile populations in Ghana showing significant morphological variation.

Non-linearity Mitigation Method of Particulate Matter using Machine Learning Clustering Algorithms (기계학습 군집 알고리즘을 이용한 미세먼지 비선형성 완화방안)

  • Lee, Sang-gwon;Cho, Kyoung-woo;Oh, Chang-heon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.341-343
    • /
    • 2019
  • As the generation of high concentration particulate matter increases, much attention is focused on the prediction of particulate matter. Particulate matter refers to particulate matter less than $10{\mu}m$ diameter in the atmosphere and is affected by weather changes such as temperature, relative humidity and wind speed. Therefore, various studies have been conducted to analyze the correlation with weather information for particulate matter prediction. However, the nonlinear time series distribution of particulate matter increases the complexity of the prediction model and can lead to inaccurate predictions. In this paper, we try to mitigate the nonlinear characteristics of particulate matter by using cluster algorithm and classification algorithm of machine learning. The machine learning algorithms used are agglomerative clustering, density-based spatial clustering of applications with noise(DBSCAN).

  • PDF

A Method to Predict the Number of Clusters

  • Chae, Seong-San;Willian D. Warde
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.162-176
    • /
    • 1991
  • The problem of determining the number of clusters, K. is the main objective of this study. Attention is focused on the use of Rand(1971)'s $C_{k}$ statistic with some agglomerative clustering algorithms(ACA) defined in the ($\beta$, $\pi$) plane in predicting the number of clusters within the given set of data. The (k, $C_{k}$) plots for k=1, 2, …, N are explored by a Monte Carlo study. Based on its performance, the use of $C_{k}$ with the pair of ACA, (-.5, .75) and (-.25, .0), is recommended for predicting the number of clusters present within a set of data. data.

  • PDF

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

  • Park, Nojin;Ko, Hanseok
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

Efficient Superpixel Generation Method Based on Image Complexity

  • Park, Sanghyun
    • Journal of Multimedia Information System
    • /
    • v.7 no.3
    • /
    • pp.197-204
    • /
    • 2020
  • Superpixel methods are widely used in the preprocessing stage as a method to reduce computational complexity by simplifying images while maintaining the characteristics of the images in the computer vision applications. It is common to generate superpixels of similar size and shape based on the pixel values rather than considering the characteristics of the image. In this paper, we propose a method to control the sizes and shapes of generated superpixels, considering the contents of an image. The proposed method consists of two steps. The first step is to over-segment an image so that the boundary information of the image is well preserved. In the second step, generated superpixels are merged based on similarity to produce the target number of superpixels, where the shapes of superpixels are controlled by limiting the maximum size and the proposed roundness metric. Experimental results show that the proposed method preserves the boundaries of the objects in an image more accurately than the existing method.

A Comparative Study of Determining the Number of Clusters with a Method Proposed (군집수의 예측에 관한 방법의 제안 및 비교)

  • Chae, Seong-San;Lim, Nam-Kyoo
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.329-341
    • /
    • 2005
  • A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).