• Title/Summary/Keyword: similarity based clustering

Search Result 322, Processing Time 0.026 seconds

Fuzzy Clustering with Genre Preference for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.99-106
    • /
    • 2020
  • The scalability problem inherent in collaborative filtering-based recommender systems has been an issue in related studies during past decades. Clustering is a well-known technique for handling this problem, but has not been actively studied due to its low performance. This paper adopts a clustering method to overcome the scalability problem, inherent drawback of collaborative filtering systems. Furthermore, in order to handle performance degradation caused by applying clustering into collaborative filtering, we take two strategies into account. First, we use fuzzy clustering and secondly, we propose and apply a similarity estimation method based on user preference for movie genres. The proposed method of this study is evaluated through experiments and compared with several previous relevant methods in terms of major performance metrics. Experimental results show that the proposed demonstrated superior performance in prediction and rank accuracies and comparable performance to the best method in our experiments in recommendation accuracy.

NOGSEC: A NOnparametric method for Genome SEquence Clustering (녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • Korean Journal of Microbiology
    • /
    • v.39 no.2
    • /
    • pp.67-75
    • /
    • 2003
  • One large topic in comparative genomics is to predict functional annotation by classifying protein sequences. Computational approaches for function prediction include protein structure prediction, sequence alignment and domain prediction or binding site prediction. This paper is on another computational approach searching for sets of homologous sequences from sequence similarity graph. Methods based on similarity graph do not need previous knowledges about sequences, but largely depend on the researcher's subjective threshold settings. In this paper, we propose a genome sequence clustering method of iterative testing and graph decomposition, and a simple method to calculate a strict threshold having biochemical meaning. Proposed method was applied to known bacterial genome sequences and the result was shown with the BAG algorithm's. Result clusters are lacking some completeness, but the confidence level is very high and the method does not need user-defined thresholds.

Trajectory Clustering in Road Network Environment (도로 네트워크 환경을 위한 궤적 클러스터링)

  • Bak, Ji-Haeng;Won, Jung-Im;Kim, Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.16D no.3
    • /
    • pp.317-326
    • /
    • 2009
  • Recently, there have been many research efforts proposed on trajectory information. Most of them mainly focus their attention on those objects moving in Euclidean space. Many real-world applications such as telematics, however, deal with objects that move only over road networks, which are highly restricted for movement. Thus, the existing methods targeting Euclidean space cannot be directly applied to the road network space. This paper proposes a new clustering scheme for a large volume of trajectory information of objects moving over road networks. To the end, we first define a trajectory on a road network as a sequence of road segments a moving object has passed by. Next, we propose a similarity measurement scheme that judges the degree of similarity by considering the total length of matched road segments. Based on such similarity measurement, we propose a new clustering algorithm for trajectories by modifying and adjusting the FastMap and hierarchical clustering schemes. To evaluate the performance of the proposed clustering scheme, we also develop a trajectory generator considering the observation that most objects tend to move from the starting point to the destination point along their shortest path, and perform a variety of experiments using the trajectories thus generated. The performance result shows that our scheme has the accuracy of over 95% in comparison with that judged by human beings.

Cause Diagnosis Method of Semiconductor Defects using Block-based Clustering and Histogram x2 Distance (블록 기반 클러스터링과 히스토그램 카이 제곱 거리를 이용한 반도체 결함 원인 진단 기법)

  • Lee, Young-Joo;Lee, Jeong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.9
    • /
    • pp.1149-1155
    • /
    • 2012
  • In this paper, we propose cause diagnosis method of semiconductor defects from semiconductor industrial images. Our method constructs feature database (DB) of defect images. Then, defect and input images are subdivided by uniform block. And the block similarity is measured using histogram kai-square distance after color histogram calculation. Then, searched blocks in each image are merged into connected objects using clustering. Finally, the most similar defect image from feature DB is searched with the defect cause by measuring cluster similarity based on features of each cluster. Our method was validated by calculating the search accuracy of n output images having high similarity. With n = 1, 2, 3, the search accuracy was measured to be 100% regardless of defect categories. Our method could be used for the industrial applications.

Lossless Compression for Hyperspectral Images based on Adaptive Band Selection and Adaptive Predictor Selection

  • Zhu, Fuquan;Wang, Huajun;Yang, Liping;Li, Changguo;Wang, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3295-3311
    • /
    • 2020
  • With the wide application of hyperspectral images, it becomes more and more important to compress hyperspectral images. Conventional recursive least squares (CRLS) algorithm has great potentiality in lossless compression for hyperspectral images. The prediction accuracy of CRLS is closely related to the correlations between the reference bands and the current band, and the similarity between pixels in prediction context. According to this characteristic, we present an improved CRLS with adaptive band selection and adaptive predictor selection (CRLS-ABS-APS). Firstly, a spectral vector correlation coefficient-based k-means clustering algorithm is employed to generate clustering map. Afterwards, an adaptive band selection strategy based on inter-spectral correlation coefficient is adopted to select the reference bands for each band. Then, an adaptive predictor selection strategy based on clustering map is adopted to select the optimal CRLS predictor for each pixel. In addition, a double snake scan mode is used to further improve the similarity of prediction context, and a recursive average estimation method is used to accelerate the local average calculation. Finally, the prediction residuals are entropy encoded by arithmetic encoder. Experiments on the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) 2006 data set show that the CRLS-ABS-APS achieves average bit rates of 3.28 bpp, 5.55 bpp and 2.39 bpp on the three subsets, respectively. The results indicate that the CRLS-ABS-APS effectively improves the compression effect with lower computation complexity, and outperforms to the current state-of-the-art methods.

Development of A Web Mining System Based On Document Similarity (문서 유사도 기반의 웹 마이닝 시스템 개발)

  • 이강찬;민재홍;박기식;임동순;우훈식
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.1
    • /
    • pp.75-86
    • /
    • 2002
  • In this study, we proposed design issues and structure of a web mining system and develop a system for the purpose of knowledge integration under world wide web environments resulted from our developing experiences. The developed system consists of three main functions: 1) gathering documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents based on similarity coefficients. It is believed that the developed system can be utilized for discovery of knowledge in relatively narrow domains such as news classification, index term generation in knowledge management.

  • PDF

An Adaptive Clustering Algorithm Based on Genetic Algorithm (유전자 알고리즘 기반 적응 군집화 알고리즘)

  • Park Namhyun;Ahn Chang Wook;Ramakrishna R.S.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.459-462
    • /
    • 2004
  • This paper proposes a genetically inspired adaptive clustering algorithm. The algorithm automatically discovers the actual number of clusters and efficiently performs clustering without unduly compromising cluster purity. Chromosome encoding that ensures the correct number of clusters and cluster purity is discussed. The required fitness function is desisted on the basis of modified similarity criteria and genetic operators. These are incorporated into the proposed adaptive clustering algorithm. Experimental results show the efficiency of the clustering algorithm on synthetic data sets and real world data sets.

  • PDF

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Efficient Time-Series Similarity Measurement and Ranking Based on Anomaly Detection (이상탐지 기반의 효율적인 시계열 유사도 측정 및 순위화)

  • Ji-Hyun Choi;Hyun Ahn
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.39-47
    • /
    • 2024
  • Time series analysis is widely employed by many organizations to solve business problems, as it extracts various information and insights from chronologically ordered data. Among its applications, measuring time series similarity is a step to identify time series with similar patterns, which is very important in time series analysis applications such as time series search and clustering. In this study, we propose an efficient method for measuring time series similarity that focuses on anomalies rather than the entire series. In this regard, we validate the proposed method by measuring and analyzing the rank correlation between the similarity measure for the set of subsets extracted by anomaly detection and the similarity measure for the whole time series. Experimental results, especially with stock time series data and an anomaly proportion of 10%, demonstrate a Spearman's rank correlation coefficient of up to 0.9. In conclusion, the proposed method can significantly reduce computation cost of measuring time series similarity, while providing reliable time series search and clustering results.

SDN-Based Hierarchical Agglomerative Clustering Algorithm for Interference Mitigation in Ultra-Dense Small Cell Networks

  • Yang, Guang;Cao, Yewen;Esmailpour, Amir;Wang, Deqiang
    • ETRI Journal
    • /
    • v.40 no.2
    • /
    • pp.227-236
    • /
    • 2018
  • Ultra-dense small cell networks (UD-SCNs) have been identified as a promising scheme for next-generation wireless networks capable of meeting the ever-increasing demand for higher transmission rates and better quality of service. However, UD-SCNs will inevitably suffer from severe interference among the small cell base stations, which will lower their spectral efficiency. In this paper, we propose a software-defined networking (SDN)-based hierarchical agglomerative clustering (SDN-HAC) framework, which leverages SDN to centrally control all sub-channels in the network, and decides on cluster merging using a similarity criterion based on a suitability function. We evaluate the proposed algorithm through simulation. The obtained results show that the proposed algorithm performs well and improves system payoff by 18.19% and 436.34% when compared with the traditional network architecture algorithms and non-cooperative scenarios, respectively.