• Title/Summary/Keyword: Correlation clustering

Search Result 271, Processing Time 0.023 seconds

An Agglomerative Hierarchical Variable-Clustering Method Based on a Correlation Matrix

  • Lee, Kwangjin
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.387-397
    • /
    • 2003
  • Generally, most of researches that need a variable-clustering process use an exploratory factor analysis technique or a divisive hierarchical variable-clustering method based on a correlation matrix. And some researchers apply a object-clustering method to a distance matrix transformed from a correlation matrix, though this approach is known to be improper. On this paper an agglomerative hierarchical variable-clustering method based on a correlation matrix itself is suggested. It is derived from a geometric concept by using variate-spaces and a characterizing variate.

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

Entropy-based Correlation Clustering for Wireless Sensor Networks in Multi-Correlated Regional Environments

  • Nga, Nguyen Thi Thanh;Khanh, Nguyen Kim;Hong, Son Ngo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.85-93
    • /
    • 2016
  • The existence of correlation characteristics brings significant potential advantages to the development of efficient routing protocols in wireless sensor networks. This research proposes a new simple method of clustering sensor nodes into correlation groups in multiple-correlation areas. At first, the evaluation of joint entropy for multiple-sensed data is considered. Based on the evaluation, the definition of correlation region, based on entropy theory, is proposed. Following that, a correlation clustering scheme with less computation is developed. The results are validated with a real data set.

Data Correlation-Based Clustering Algorithm in Wireless Sensor Networks

  • Yeo, Myung-Ho;Seo, Dong-Min;Yoo, Jae-Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.3
    • /
    • pp.331-343
    • /
    • 2009
  • Many types of sensor data exhibit strong correlation in both space and time. Both temporal and spatial suppressions provide opportunities for reducing the energy cost of sensor data collection. Unfortunately, existing clustering algorithms are difficult to utilize the spatial or temporal opportunities, because they just organize clusters based on the distribution of sensor nodes or the network topology but not on the correlation of sensor data. In this paper, we propose a novel clustering algorithm based on the correlation of sensor data. We modify the advertisement sub-phase and TDMA schedule scheme to organize clusters by adjacent sensor nodes which have similar readings. Also, we propose a spatio-temporal suppression scheme for our clustering algorithm. In order to show the superiority of our clustering algorithm, we compare it with the existing suppression algorithms in terms of the lifetime of the sensor network and the size of data which have been collected in the base station. As a result, our experimental results show that the size of data is reduced and the whole network lifetime is prolonged.

Design of Hierarchically Structured Clustering Algorithm and its Application (계층 구조 클러스터링 알고리즘 설계 및 그 응용)

  • Bang, Young-Keun;Park, Ha-Yong;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.29 no.B
    • /
    • pp.17-23
    • /
    • 2009
  • In many cases, clustering algorithms have been used for extracting and discovering useful information from non-linear data. They have made a great effect on performances of the systems dealing with non-linear data. Thus, this paper presents a new approach called hierarchically structured clustering algorithm, and it is applied to the prediction system for non-linear time series data. The proposed hierarchically structured clustering algorithm (called HCKA: Hierarchical Cross-correlation and K-means clustering Algorithms) in which the cross-correlation and k-means clustering algorithm are combined can accept the correlationship of non-linear time series as well as statistical characteristics. First, the optimal differences of data are generated, which can suitably reveal the characteristics of non-linear time series. Second, the generated differences are classified into the upper clusters for their predictors by the cross-correlation clustering algorithm, and then each classified differences are classified again into the lower fuzzy sets by the k-means clustering algorithm. As a result, the proposed method can give an efficient classification and improve the performance. Finally, we demonstrates the effectiveness of the proposed HCKA via typical time series examples.

  • PDF

Comparison Study for Data Fusion and Clustering Classification Performances (다구찌 디자인을 이용한 데이터 퓨전 및 군집분석 분류 성능 비교)

  • 신형원;손소영
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.601-604
    • /
    • 2000
  • In this paper, we compare the classification performance of both data fusion and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. Since the relationship between input & output is not typically known, we use Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: Clustering based logistic regression turns out to provide the highest classification accuracy when input variables are weakly correlated and the variance of data is high. When there is high correlation among input variables, variable bagging performs better than logistic regression. When there is strong correlation among input variables and high variance between observations, bagging appears to be marginally better than logistic regression but was not significant.

  • PDF

STATISTICAL NOISE BAND REMOVAL FOR SURFACE CLUSTERING OF HYPERSPECTRAL DATA

  • Huan, Nguyen Van;Kim, Hak-Il
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.111-114
    • /
    • 2008
  • The existence of noise bands may deform the typical shape of the spectrum, making the accuracy of clustering degraded. This paper proposes a statistical approach to remove noise bands in hyperspectral data using the correlation coefficient of bands as an indicator. Considering each band as a random variable, two adjacent signal bands in hyperspectral data are highly correlative. On the contrary, existence of a noise band will produce a low correlation. For clustering, the unsupervised ${\kappa}$-nearest neighbor clustering method is implemented in accordance with three well-accepted spectral matching measures, namely ED, SAM and SID. Furthermore, this paper proposes a hierarchical scheme of combining those measures. Finally, a separability assessment based on the between-class and the within-class scatter matrices is followed to evaluate the applicability of the proposed noise band removal method. Also, the paper brings out a comparison for spectral matching measures.

  • PDF

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF

Analysis of COVID-19 Context-awareness based on Clustering Algorithm (클러스터링 알고리즘기반의 COVID-19 상황인식 분석)

  • Lee, Kangwhan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.755-762
    • /
    • 2022
  • This paper propose a clustered algorithm that possible more efficient COVID-19 disease learning prediction within clustering using context-aware attribute information. In typically, clustering of COVID-19 diseases provides to classify interrelationships within disease cluster information in the clustering process. The clustering data will be as a degrade factor if new or newly processing information during treated as contaminated factors in comparative interrelationships information. In this paper, we have shown the solving the problems and developed a clustering algorithm that can extracting disease correlation information in using K-means algorithm. According to their attributes from disease clusters using accumulated information and interrelationships clustering, the proposed algorithm analyzes the disease correlation clustering possible and centering points. The proposed algorithm showed improved adaptability to prediction accuracy of the classification management system in terms of learning as a group of multiple disease attribute information of COVID-19 through the applied simulation results.

ANGULAR CLUSTERING OF FIR-SELECTED GALAXIES IN THE AKARI ALL-SKY SURVEY

  • Pollo, A.;Takeuchi, T.T.;Suzuki, T.L.;Oyabu, S.
    • Publications of The Korean Astronomical Society
    • /
    • v.27 no.4
    • /
    • pp.343-344
    • /
    • 2012
  • We present the first measurement of the angular two-point correlation function for AKARI $90{\mu}m$ point sources, detected outside of the Milky Way plane and selected as candidates for extragalactic sources. This is the first measurement of the large-scale angular clustering of galaxies selected in the far-infrared after IRAS. We find a positive clustering signal in both hemispheres extending up to ~ 40 degrees, without any significant fluctuations at larger scales. The observed correlation function is well fitted by a power law function. However, southern galaxies seem to be more strongly clustered than northern ones and the difference is statistically significant. The reason for this difference - technical or physical - is still to be found.