• Title/Summary/Keyword: Agglomerative clustering algorithms

Search Result 13, Processing Time 0.022 seconds

Application of Principal Component Analysis Prior to Cluster Analysis in the Concept of Informative Variables

  • Chae, Seong-San
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.1057-1068
    • /
    • 2003
  • Results of using principal component analysis prior to cluster analysis are compared with results from applying agglomerative clustering algorithm alone. The retrieval ability of the agglomerative clustering algorithm is improved by using principal components prior to cluster analysis in some situations. On the other hand, the loss in retrieval ability for the agglomerative clustering algorithms decreases, as the number of informative variables increases, where the informative variables are the variables that have distinct information(or, necessary information) compared to other variables.

Classification of basin characteristics related to inundation using clustering (군집분석을 이용한 침수관련 유역특성 분류)

  • Lee, Han Seung;Cho, Jae Woong;Kang, Ho seon;Hwang, Jeong Geun;Moon, Hae Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF

SDN-Based Hierarchical Agglomerative Clustering Algorithm for Interference Mitigation in Ultra-Dense Small Cell Networks

  • Yang, Guang;Cao, Yewen;Esmailpour, Amir;Wang, Deqiang
    • ETRI Journal
    • /
    • v.40 no.2
    • /
    • pp.227-236
    • /
    • 2018
  • Ultra-dense small cell networks (UD-SCNs) have been identified as a promising scheme for next-generation wireless networks capable of meeting the ever-increasing demand for higher transmission rates and better quality of service. However, UD-SCNs will inevitably suffer from severe interference among the small cell base stations, which will lower their spectral efficiency. In this paper, we propose a software-defined networking (SDN)-based hierarchical agglomerative clustering (SDN-HAC) framework, which leverages SDN to centrally control all sub-channels in the network, and decides on cluster merging using a similarity criterion based on a suitability function. We evaluate the proposed algorithm through simulation. The obtained results show that the proposed algorithm performs well and improves system payoff by 18.19% and 436.34% when compared with the traditional network architecture algorithms and non-cooperative scenarios, respectively.

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

Non-linearity Mitigation Method of Particulate Matter using Machine Learning Clustering Algorithms (기계학습 군집 알고리즘을 이용한 미세먼지 비선형성 완화방안)

  • Lee, Sang-gwon;Cho, Kyoung-woo;Oh, Chang-heon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.341-343
    • /
    • 2019
  • As the generation of high concentration particulate matter increases, much attention is focused on the prediction of particulate matter. Particulate matter refers to particulate matter less than $10{\mu}m$ diameter in the atmosphere and is affected by weather changes such as temperature, relative humidity and wind speed. Therefore, various studies have been conducted to analyze the correlation with weather information for particulate matter prediction. However, the nonlinear time series distribution of particulate matter increases the complexity of the prediction model and can lead to inaccurate predictions. In this paper, we try to mitigate the nonlinear characteristics of particulate matter by using cluster algorithm and classification algorithm of machine learning. The machine learning algorithms used are agglomerative clustering, density-based spatial clustering of applications with noise(DBSCAN).

  • PDF

Performance Comparison of Clustering Techniques for Spatio-Temporal Data (시공간 데이터를 위한 클러스터링 기법 성능 비교)

  • Kang Nayoung;Kang Juyoung;Yong Hwan-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.15-37
    • /
    • 2004
  • With the growth in the size of datasets, data mining has recently become an important research topic. Especially, interests about spatio-temporal data mining has been increased which is a method for analyzing massive spatio-temporal data collected from a wide variety of applications like GPS data, trajectory data of surveillance system and earth geographic data. In the former approaches, conventional clustering algorithms are applied as spatio-temporal data mining techniques without any modification. In this paper, we focused to SOM that is the most common clustering algorithm applied to clustering analysis in data mining wet and develop the spatio-temporal data mining module based on it. In addition, we analyzed the clustering results of developed SOM module and compare them with those of K-means and Agglomerative Hierarchical algorithm in the aspects of homogeneity, separation, separation, silhouette width and accuracy. We also developed specialized visualization module fur more accurate interpretation of mining result.

  • PDF

Visualizing Cluster Hierarchy Using Hierarchy Generation Framework (계층 발생 프레임워크를 이용한 군집 계층 시각화)

  • Shin, DongHwa;L'Yi, Sehi;Seo, Jinwook
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.6
    • /
    • pp.436-441
    • /
    • 2015
  • There are many types of clustering algorithms such as centroid, hierarchical, or density-based methods. Each algorithm has unique data grouping principles, which creates different varieties of clusters. Ordering Points To Identify the Clustering Structure (OPTICS) is a well-known density-based algorithm to analyze arbitrary shaped and varying density clusters, but the obtained clusters only correlate loosely. Hierarchical agglomerative clustering (HAC) reveals a hierarchical structure of clusters, but is unable to clearly find non-convex shaped clusters. In this paper, we provide a novel hierarchy generation framework and application which can aid users by combining the advantages of the two clustering methods.

A Method to Predict the Number of Clusters

  • Chae, Seong-San;Willian D. Warde
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.162-176
    • /
    • 1991
  • The problem of determining the number of clusters, K. is the main objective of this study. Attention is focused on the use of Rand(1971)'s $C_{k}$ statistic with some agglomerative clustering algorithms(ACA) defined in the ($\beta$, $\pi$) plane in predicting the number of clusters within the given set of data. The (k, $C_{k}$) plots for k=1, 2, …, N are explored by a Monte Carlo study. Based on its performance, the use of $C_{k}$ with the pair of ACA, (-.5, .75) and (-.25, .0), is recommended for predicting the number of clusters present within a set of data. data.

  • PDF

A Comparative Study of Determining the Number of Clusters with a Method Proposed (군집수의 예측에 관한 방법의 제안 및 비교)

  • Chae, Seong-San;Lim, Nam-Kyoo
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.329-341
    • /
    • 2005
  • A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).