• Title/Summary/Keyword: Correlation-based clustering algorithm

Search Result 53, Processing Time 0.027 seconds

Property-based Hierarchical Clustering of Peers using Mobile Agent for Unstructured P2P Systems (비구조화 P2P 시스템에서 이동에이전트를 이용한 Peer의 속성기반 계층적 클러스터링)

  • Salvo, MichaelAngelG.;Mateo, RomeoMarkA.;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.189-198
    • /
    • 2009
  • Unstructured peer-to-peer systems are most commonly used in today's internet. But file placement is random in these systems and no correlation exists between peers and their contents. There is no guarantee that flooding queries will find the desired data. In this paper, we propose to cluster nodes in unstructured P2P systems using the agglomerative hierarchical clustering algorithm to improve the search method. We compared the delay time of clustering the nodes between our proposed algorithm and the k-means clustering algorithm. We also simulated the delay time of locating data in a network topology and recorded the overhead of the system using our proposed algorithm, k-means clustering, and without clustering. Simulation results show that the delay time of our proposed algorithm is shorter compared to other methods and resource overhead is also reduced.

  • PDF

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF

Lossless Compression for Hyperspectral Images based on Adaptive Band Selection and Adaptive Predictor Selection

  • Zhu, Fuquan;Wang, Huajun;Yang, Liping;Li, Changguo;Wang, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3295-3311
    • /
    • 2020
  • With the wide application of hyperspectral images, it becomes more and more important to compress hyperspectral images. Conventional recursive least squares (CRLS) algorithm has great potentiality in lossless compression for hyperspectral images. The prediction accuracy of CRLS is closely related to the correlations between the reference bands and the current band, and the similarity between pixels in prediction context. According to this characteristic, we present an improved CRLS with adaptive band selection and adaptive predictor selection (CRLS-ABS-APS). Firstly, a spectral vector correlation coefficient-based k-means clustering algorithm is employed to generate clustering map. Afterwards, an adaptive band selection strategy based on inter-spectral correlation coefficient is adopted to select the reference bands for each band. Then, an adaptive predictor selection strategy based on clustering map is adopted to select the optimal CRLS predictor for each pixel. In addition, a double snake scan mode is used to further improve the similarity of prediction context, and a recursive average estimation method is used to accelerate the local average calculation. Finally, the prediction residuals are entropy encoded by arithmetic encoder. Experiments on the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) 2006 data set show that the CRLS-ABS-APS achieves average bit rates of 3.28 bpp, 5.55 bpp and 2.39 bpp on the three subsets, respectively. The results indicate that the CRLS-ABS-APS effectively improves the compression effect with lower computation complexity, and outperforms to the current state-of-the-art methods.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Hybrid-clustering game Algorithm for Resource Allocation in Macro-Femto HetNet

  • Ye, Fang;Dai, Jing;Li, Yibing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1638-1654
    • /
    • 2018
  • The heterogeneous network (HetNet) has been one of the key technologies in Long Term Evolution-Advanced (LTE-A) with growing capacity and coverage demands. However, the introduction of femtocells has brought serious co-layer interference and cross-layer interference, which has been a major factor affecting system throughput. It is generally acknowledged that the resource allocation has significant impact on suppressing interference and improving the system performance. In this paper, we propose a hybrid-clustering algorithm based on the $Mat{\acute{e}}rn$ hard-core process (MHP) to restrain two kinds of co-channel interference in the HetNet. As the impracticality of the hexagonal grid model and the homogeneous Poisson point process model whose points distribute completely randomly to establish the system model. The HetNet model based on the MHP is adopted to satisfy the negative correlation distribution of base stations in this paper. Base on the system model, the spectrum sharing problem with restricted spectrum resources is further analyzed. On the basis of location information and the interference relation of base stations, a hybrid clustering method, which takes into accounts the fairness of two types of base stations is firstly proposed. Then, auction mechanism is discussed to achieve the spectrum sharing inside each cluster, avoiding the spectrum resource waste. Through combining the clustering theory and auction mechanism, the proposed novel algorithm can be applied to restrain the cross-layer interference and co-layer interference of HetNet, which has a high density of base stations. Simulation results show that spectral efficiency and system throughput increase to a certain degree.

Clustering-Based Recommendation Using Users' Preference (사용자 선호도를 사용한 군집 기반 추천 시스템)

  • Kim, Younghyun;Shin, Won-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.2
    • /
    • pp.277-284
    • /
    • 2017
  • In a flood of information, most users will want to get a proper recommendation. If a recommender system fails to give appropriate contents, then quality of experience (QoE) will be drastically decreased. In this paper, we propose a recommender system based on the intra-cluster users' item preference for improving recommendation accuracy indices such as precision, recall, and F1 score. To this end, first, users are divided into several clusters based on the actual rating data and Pearson correlation coefficient (PCC). Afterwards, we give each item an advantage/disadvantage according to the preference tendency by users within the same cluster. Specifically, an item will be received an advantage/disadvantage when the item which has been averagely rated by other users within the same cluster is above/below a predefined threshold. The proposed algorithm shows a statistically significant performance improvement over the item-based collaborative filtering algorithm with no clustering in terms of recommendation accuracy indices such as precision, recall, and F1 score.

A Design of GA-based TSK Fuzzy Classifier and Its Application (GA 기반 TSK 퍼지 분류기의 설계와 응용)

  • 곽근창;김승석;유정웅;김승석
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.8
    • /
    • pp.754-759
    • /
    • 2001
  • In this paper, we propose a TSK(Takagi-Sugeno-Kang)-type fuzzy classifier using PCA(Principal Component Analysis), FCM(Fuzzy c-Means) clustering, ANFIS(Adaptive Neuro-Fuzzy Inference System) and hybrid GA(Genetic Algorithm). First, input data is transformed to reduce correlation among the data components by PCA. FCM clustering is applied to obtain a initial TSK-type fuzzy classifier. Parameter identification is performed by AGA(Adaptive GA) and RLSE(Recursive Least Square Estimate). Finally, we applied the proposed method to Iris data classificationl problems and obtained a better performance than previous works.

  • PDF

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

Fuzzy system construction based on Genetic Algorithms and fuzzy clustering

  • Kwak, Keun-Chang;Kim, Seoung-Suk;Ryu, Jeong-Woong;Chun, Myung-Geun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.109.6-109
    • /
    • 2002
  • In this paper, the scheme of fuzzy system construction using GA(genetic algorithm) and FCM(Fuzzy c-means) clustering algorithm is proposed for TSK(Takagi-Sugeno-Kang) type fuzzy system. in the structure identification, input data is trans-formed by PCA(Principal Component Analysis) to reduce the correlation among input data components. And then, the number of fuzzy rule is obtained by a given performance criterion. In the parameter identification, the premise parameters are optimally searched by GA. On the other hand, the consequent parameters are estimated by RLSE(Recursive Least Square Estimate) to reduce the search space. From this, one can systematically obtain optimal parameter and the v..

  • PDF

Development of Sasang Type Diagnostic Test with Neural Network (신경망을 사용한 사상체질 진단검사 개발 연구)

  • Chae, Han;Hwang, Sang-Moon;Eom, Il-Kyu;Kim, Byoung-Chul;Kim, Young-In;Kim, Byung-Joo;Kwon, Young-Kyu
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.23 no.4
    • /
    • pp.765-771
    • /
    • 2009
  • The medical informatics for clustering Sasang types with collected clinical data is important for the personalized medicine, but it has not been thoroughly studied yet. The purpose of this study was to examine the usefulness of neural network data mining algorithm for traditional Korean medicine. We used Kohonen neural network, the Self-Organizing Map (SOM), for the analysis of biomedical information following data pre-processing and calculated the validity index as percentage correctly predicted and type-specific sensitivity. We can extract 12 data fields from 30 after data pre-processing with correlation analysis and latent functional relationship analysis. The profile of Myers-Briggs Type Inidcator and Bio-Impedance Analysis data which are clustered with SOM was similar to that of original measurements. The percentage correctly predicted was 56%, and sensitivity for So-Yang, Tae-Eum and So-Eum type were 56%, 48%, and 61%, respectively. This study showed that the neural network algorithm for clustering Sasang types based on clinical data is useful for the sasang type diagnostic test itself. We discussed the importance of data pre-processing and clustering algorithm for the validity of medical devices in traditional Korean medicine.