• Title/Summary/Keyword: Correlation clustering

Search Result 271, Processing Time 0.028 seconds

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

A new Ensemble Clustering Algorithm using a Reconstructed Mapping Coefficient

  • Cao, Tuoqia;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2957-2980
    • /
    • 2020
  • Ensemble clustering commonly integrates multiple basic partitions to obtain a more accurate clustering result than a single partition. Specifically, it exists an inevitable problem that the incomplete transformation from the original space to the integrated space. In this paper, a novel ensemble clustering algorithm using a newly reconstructed mapping coefficient (ECRMC) is proposed. In the algorithm, a newly reconstructed mapping coefficient between objects and micro-clusters is designed based on the principle of increasing information entropy to enhance effective information. This can reduce the information loss in the transformation from micro-clusters to the original space. Then the correlation of the micro-clusters is creatively calculated by the Spearman coefficient. Therefore, the revised co-association graph between objects can be built more accurately because the supplementary information can well ensure the completeness of the whole conversion process. Experiment results demonstrate that the ECRMC clustering algorithm has high performance, effectiveness, and feasibility.

A Pattern Consistency Index for Detecting Heterogeneous Time Series in Clustering Time Course Gene Expression Data (시간경로 유전자 발현자료의 군집분석에서 이질적인 시계열의 탐지를 위한 패턴일치지수)

  • Son, Young-Sook;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.371-379
    • /
    • 2005
  • In this paper, we propose a pattern consistency index for detecting heterogeneous time series that deviate from the representative pattern of each cluster in clustering time course gene expression data using the Pearson correlation coefficient. We examine its usefulness by applying this index to serum time course gene expression data from microarrays.

Galaxy clustering from the UKIDSS DXS

  • Kim, Jae-U
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.37 no.1
    • /
    • pp.36.1-36.1
    • /
    • 2012
  • Recent wide and deep surveys allow us to investigate the large scale structure of the Universe at high redshift. We present studies of the clustering of high redshift galaxies, using reprocessed UKIDSS DXS catalogue. We measure the angular correlation function of high redshift galaxies which is Extremely Red Objects (EROs). Firstly we found that their angular correlation functions can be described by a broken power-law. We also found that red or bright samples are more strongly clustered than those having the opposite characteristics, and that old, passive EROs are found to be more clustered than dustry, star-forming EROs. Additionally the average halo mass and other properties were estimated using the halo model. Finally the observed clustering of EROs was compared with predictions from the cosmological simulation.

  • PDF

The Design of GA-based TSK Fuzzy Classifier and Its application (GA기반 TSK 퍼지 분류기의 설계 및 응용)

  • 곽근창;김승석;유정웅;전명근
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.233-236
    • /
    • 2001
  • In this paper, we propose a TSK-type fuzzy classifier using PCA(Principal Component Analysis), FCM(Fuzzy C-Means) clustering and hybrid GA(genetic algorithm). First, input data is transformed to reduce correlation among the data components by PCA. FCM clustering is applied to obtain a initial TSK-type fuzzy classifier. Parameter identification is performed by AGA(Adaptive Genetic Algorithm) and RLSE(Recursive Least Square Estimate). we applied the proposed method to Iris data classification problems and obtained a better performance than previous works.

  • PDF

Correlation between Impervious Surface Area Rate and Urbanization Indicators at the Si-Gun Level (시군단위의 불투수면적률과 도시화 지표의 상관성 분석)

  • Jang, Min-Won;Kim, Hyeonjoon;Choi, Yoonhee;Kim, Hakkwan
    • Journal of Korean Society of Rural Planning
    • /
    • v.29 no.4
    • /
    • pp.55-67
    • /
    • 2023
  • This study investigated the correlation between impervious surface area rate(ISAR) and various urbanization indicators at the si-gun administrative level. For the years 2017 and 2021, we built correlation matrices to examine the relationships between ISAR and eight urbanization indicators, including total population, working-age population, residential power consumption, non-agricultural power consumption, paved road length, permitted development area, numbers of registered vehicles, and cadastral 'Dae' parcel area. Additionally, K-means clustering was employed to classify the 229 si-guns based on the ISAR change patterns. The analysis revealed a significant positive correlation between ISAR and urbanization indicators for both years studied. However, the interannual comparison showed a noticeably weaker correlation between changes in ISAR and urbanization indicators from 2017 to 2021. The K-means analysis also showed that si-guns with higher ISAR values, typically urban areas, demonstrated a weaker correlation, while the cluster consisting mostly of rural areas with lower ISAR displayed stronger correlations. These results suggested that ISAR should be a significant factor for consideration in sustainable rural planning and development strategies.

The clustering of critical points in the evolving cosmic web

  • Shim, Junsup;Codis, Sandrine;Pichon, Christophe;Pogosyan, Dmitri;Cadiou, Corentin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.46 no.1
    • /
    • pp.47.2-47.2
    • /
    • 2021
  • Focusing on both small separations and baryonic acoustic oscillation scales, the cosmic evolution of the clustering properties of peak, void, wall, and filament-type critical points is measured using two-point correlation functions in ΛCDM dark matter simulations as a function of their relative rarity. A qualitative comparison to the corresponding theory for Gaussian random fields allows us to understand the following observed features: (i) the appearance of an exclusion zone at small separation, whose size depends both on rarity and signature (i.e. the number of negative eigenvalues) of the critical points involved; (ii) the amplification of the baryonic acoustic oscillation bump with rarity and its reversal for cross-correlations involving negatively biased critical points; (iii) the orientation-dependent small-separation divergence of the cross-correlations of peaks and filaments (respectively voids and walls) that reflects the relative loci of such points in the filament's (respectively wall's) eigenframe. The (cross-) correlations involving the most non-linear critical points (peaks, voids) display significant variation with redshift, while those involving less non-linear critical points seem mostly insensitive to redshift evolution, which should prove advantageous to model. The ratios of distances to the maxima of the peak-to-wall and peak-to-void over that of the peak-to-filament cross-correlation are ~2-√~2 and ~3-√~3WJ, respectively, which could be interpreted as the cosmic crystal being on average close to a cubic lattice. The insensitivity to redshift evolution suggests that the absolute and relative clustering of critical points could become a topologically robust alternative to standard clustering techniques when analysing upcoming surveys such as Euclid or Large Synoptic Survey Telescope (LSST).

  • PDF

A Web Personalized Recommender System Using Clustering-based CBR (클러스터링 기반 사례기반추론을 이용한 웹 개인화 추천시스템)

  • Hong, Tae-Ho;Lee, Hee-Jung;Suh, Bo-Mil
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.107-121
    • /
    • 2005
  • Recently, many researches on recommendation systems and collaborative filtering have been proceeding in both research and practice. However, although product items may have multi-valued attributes, previous studies did not reflect the multi-valued attributes. To overcome this limitation, this paper proposes new methodology for recommendation system. The proposed methodology uses multi-valued attributes based on clustering technique for items and applies the collaborative filtering to provide accurate recommendations. In the proposed methodology, both user clustering-based CBR and item attribute clustering-based CBR technique have been applied to the collaborative filtering to consider correlation of item to item as well as correlation of user to user. By using multi-valued attribute-based clustering technique for items, characteristics of items are identified clearly. Extensive experiments have been performed with MovieLens data to validate the proposed methodology. The results of the experiment show that the proposed methodology outperforms the benchmarked methodologies: Case Based Reasoning Collaborative Filtering (CBR_CF) and User Clustering Case Based Reasoning Collaborative Filtering (UC_CBR_CF).

  • PDF

Temperature network analysis of the Korean peninsula linking by DCCA methodology (DCCA 방법으로 연결된 한반도의 기온 네트워크 분석)

  • Min, Seungsik
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1445-1458
    • /
    • 2016
  • This paper derives a correlation coefficient using detrended cross-correlation analysis (DCCA) method for 59 regional temperature series for 40 years from 1976 to 2015. The average temperature, maximum temperature, and minimum temperature series for 4 year units are analyzed; consequently, we estimated that a temperature correlation exists between the two regions during the unit period where the correlation coefficient is greater than or equal to 0.9; subsequently, we construct a network linking the two regions. Based on network theory, average path length, clustering coefficient, assortativity, and modularity were derived. As a result, it was found that the temperature network satisfies a small-worldness property and is a network having assortativity and modularity.

Selection of Optimal Sensor Locations for Thermal Error Model of Machine tools (공작기계 열오차 모델의 최적 센서위치 선정)

  • 안중용
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 1999.10a
    • /
    • pp.345-350
    • /
    • 1999
  • The effectiveness of software error compensation for thermally induced machine tool errors relies on the prediction accuracy of the pre-established thermal error models. The selection of optimal sensor locations is the most important in establishing these empirical models. In this paper, a methodology for the selection of optimal sensor locations is proposed to establish a robust linear model which is not subjected to collinearity. Correlation coefficient and time delay are used as thermal parameters for optimal sensor location. Firstly, thermal deformation and temperatures are measured with machine tools being excited by sinusoidal heat input. And then, after correlation coefficient and time delays are calculated from the measured data, the optimal sensor location is selected through hard c-means clustering and sequential selection method. The validity of the proposed methodology is verified through the estimation of thermal expansion along Z-axis by spindle rotation.

  • PDF