• Title/Summary/Keyword: similarity based clustering

Search Result 322, Processing Time 0.024 seconds

Regional Grouping of Transmission System Using the Sequential Clustering Technique (순차적 클러스터링기법을 이용한 송전 계통의 지역별 그룹핑)

  • Kim, Hyun-Houng;Lee, Woo-Nam;Park, Jong-Bae;Shin, Joong-Rin;Kim, Jin-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.911-917
    • /
    • 2009
  • This paper introduces a sequential clustering technique as a tool for an effective grouping of transmission systems. The interconnected network system retains information about the location of each line. With this information, this paper aims to carry out initial clustering through the transmission usage rate, compare the similarity measures of regional information with the similarity measures of location price, and introduce the techniques of the clustering method. This transmission usage rate uses power flow based on congestion costs and similarity measurements using the FCM(Fuzzy C-Mean) algorithm. This paper also aims to prove the propriety of the proposed clustering method by comparing it with existing clustering methods that use the similarity measurement system. The proposed algorithm is demonstrated through the IEEE 39-bus RTS and Korea power system.

A Hierarchical Clustering Algorithm Using Extended Sequence Element-based Similarity Measure (확장된 시퀀스 요소 기반의 유사도를 이용한 계층적 클러스터링 알고리즘)

  • Oh, Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.321-327
    • /
    • 2006
  • Recently there has been enormous growth in the amount of commercial and scientific data. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a similarity measure and a method for clustering such sequence datasets. Especially, we present an extended concept of the measure of similarity, which considers various conditions. Using a splice dataset, we show that the quality of clusters generated by our proposed clustering algorithm is better than that of clusters produced by traditional clustering algorithms.

  • PDF

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.

An Efficient Conceptual Clustering Scheme (효율적인 개념 클러스터링 기법)

  • Yang, Gi-Chul
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.4
    • /
    • pp.349-354
    • /
    • 2020
  • This paper, firstly, propose a new Clustering scheme Based on Conceptual graphs (CBC) that can describe objects freely and can perform clustering efficiently. The conceptual clustering is one of machine learning technique. The similarity among the objects in conceptual clustering are decided on the bases of concept membership, unlike the general clustering scheme which decide the similarity without considering the context or environment of the objects. A new conceptual clustering scheme, CBC, which can perform efficient conceptual clustering by describing various objects freely with conceptual graphs is introduced in this paper.

Community Detection using Closeness Similarity based on Common Neighbor Node Clustering Entropy

  • Jiang, Wanchang;Zhang, Xiaoxi;Zhu, Weihua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2587-2605
    • /
    • 2022
  • In order to efficiently detect community structure in complex networks, community detection algorithms can be designed from the perspective of node similarity. However, the appropriate parameters should be chosen to achieve community division, furthermore, these existing algorithms based on the similarity of common neighbors have low discrimination between node pairs. To solve the above problems, a noval community detection algorithm using closeness similarity based on common neighbor node clustering entropy is proposed, shorted as CSCDA. Firstly, to improve detection accuracy, common neighbors and clustering coefficient are combined in the form of entropy, then a new closeness similarity measure is proposed. Through the designed similarity measure, the closeness similar node set of each node can be further accurately identified. Secondly, to reduce the randomness of the community detection result, based on the closeness similar node set, the node leadership is used to determine the most closeness similar first-order neighbor node for merging to create the initial communities. Thirdly, for the difficult problem of parameter selection in existing algorithms, the merging of two levels is used to iteratively detect the final communities with the idea of modularity optimization. Finally, experiments show that the normalized mutual information values are increased by an average of 8.06% and 5.94% on two scales of synthetic networks and real-world networks with real communities, and modularity is increased by an average of 0.80% on the real-world networks without real communities.

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

A New Similarity Measure for Categorical Attribute-Based Clustering (범주형 속성 기반 군집화를 위한 새로운 유사 측도)

  • Kim, Min;Jeon, Joo-Hyuk;Woo, Kyung-Gu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.71-81
    • /
    • 2010
  • The problem of finding clusters is widely used in numerous applications, such as pattern recognition, image analysis, market analysis. The important factors that decide cluster quality are the similarity measure and the number of attributes. Similarity measures should be defined with respect to the data types. Existing similarity measures are well applicable to numerical attribute values. However, those measures do not work well when the data is described by categorical attributes, that is, when no inherent similarity measure between values. In high dimensional spaces, conventional clustering algorithms tend to break down because of sparsity of data points. To overcome this difficulty, a subspace clustering approach has been proposed. It is based on the observation that different clusters may exist in different subspaces. In this paper, we propose a new similarity measure for clustering of high dimensional categorical data. The measure is defined based on the fact that a good clustering is one where each cluster should have certain information that can distinguish it with other clusters. We also try to capture on the attribute dependencies. This study is meaningful because there has been no method to use both of them. Experimental results on real datasets show clusters obtained by our proposed similarity measure are good enough with respect to clustering accuracy.

Similarity-based Dynamic Clustering Using Radar Reflectivity Data (퍼지모델을 이용한 유사성 기반의 동적 클러스터링)

  • Lee, Han-Soo;Kim, Su-Dae;Kim, Yong-Hyun;Kim, Sung-Shin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.219-222
    • /
    • 2011
  • There are number of methods that track the movement of an object or the change of state, such as Kalman filter, particle filter, dynamic clustering, and so on. Amongst these method, dynamic clustering method is an useful way to track cluster across multiple data frames and analyze their trend. In this paper we suggest the similarity-based dynamic clustering method, and verifies it's performance by simulation. Proposed dynamic clustering method is how to determine the same clusters for each continuative frame. The same clusters have similar characteristics across adjacent frames. The change pattern of cluster's characteristics in each time frame is throughly studied. Clusters in each time frames are matched against each others to see their similarity. Mamdani fuzzy model is used to determine similarity based matching algorithm. The proposed algorithm is applied to radar reflectivity data over time domain. We were able to observe time dependent characteristic of the clusters.

  • PDF