• Title/Summary/Keyword: and clustering

Search Result 5,621, Processing Time 0.032 seconds

A New Type of Clustering Problem with Two Objectives (복수 목적함수를 갖는 새로운 형태의 집단분할 문제)

  • Lee, Jae-Yeong
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.24 no.1
    • /
    • pp.145-156
    • /
    • 1998
  • In a classical clustering problem, grouping is done on the basis of similarities or distances (dissimilarities) among the elements. Therefore, the objective is to minimize the variance within each group while maximizing the between-group variance among all groups. In this paper, however, a new class of clustering problem is introduced. We call this a laydown grouping problem (LGP). In LGP, the objective is to minimize both the within-group and between-group variances. Furthermore, the problem is expanded to a multi-dimensional case where the two-way minimization process must be considered for each dimension simultaneously for all measurement characteristics. At first, the problem is assessed by analyzing its variance structures and their complexities by conjecturing that LGP is NP-complete. Then, the simulated annealing (SA) algorithm is applied and the results are compared against that from others.

  • PDF

Support Vector Machine based Cluster Merging (Support Vector Machines 기반의 클러스터 결합 기법)

  • Choi, Byung-In;Rhee, Frank Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.369-374
    • /
    • 2004
  • A cluster merging algorithm that merges convex clusters resulted by the Fuzzy Convex Clustering(FCC) method into non-convex clusters was proposed. This was achieved by proposing a fast and reliable distance measure between two convex clusters using Support Vector Machines(SVM) to improve accuracy and speed over other existing conventional methods. In doing so, it was possible to reduce cluster number without losing its representation of the data. In this paper, results for several data sets are given to show the validity of our distance measure and algorithm.

A Study on Partial Pattern Estimation for Sequential Agglomerative Hierarchical Nested Model (SAHN 모델의 부분적 패턴 추정 방법에 대한 연구)

  • Jang, Kyung-Won;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.143-145
    • /
    • 2005
  • In this paper, an empirical study result on pattern estimation method is devoted to reveal underlying data patterns with a relatively reduced computational cost. Presented method performs crisp type clustering with given n number of data samples by means of the sequential agglomerative hierarchical nested model (SAHN). Conventional SAHN based clustering requires large computation time in the initial step of algorithm. To deal with this concern, we modified overall process with a partial approach. In the beginning of this method, we divide given data set to several sub groups with uniform sampling and then each divided sub data group is applied to SAHN based method. The advantage of this method reduces computation time of original process and gives similar results. Proposed is applied to several test data set and simulation result with conceptual analysis is presented.

  • PDF

Similarity measure for P2P processing of semantic data (시맨틱웹 데이터의 P2P 처리를 위한 유사도 측정)

  • Kim, Byung Gon;Kim, Youn Hee
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.6 no.4
    • /
    • pp.11-20
    • /
    • 2010
  • Ontology is important role in semantic web to construct and query semantic data. Because of dynamic characteristic of ontology, P2P environment is considered for ontology processing in web environment. For efficient processing of ontology in P2P environment, clustering of peers should be considered. When new peer is added to the network, cluster allocation problem of the new peer is important for system efficiency. For clustering of peers with similar chateristics, similarlity measure method of ontology in added peer with ontologies in other clusters is needed. In this paper, we propose similarity measure techniques of ontologies for clustering of peers. Similarity measure method in this paper considered ontology's strucural characteristics like schema, class, property. Results of experiments show that ontologies of similar topics, class, property can be allocated to the same cluster.

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

  • 류중원;조성배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.187-190
    • /
    • 2001
  • Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.

  • PDF

Single Pass Algorithm for Text Clustering by Encoding Documents into Tables

  • Jo, Tae-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.12
    • /
    • pp.1749-1757
    • /
    • 2008
  • This research proposes a modified version of single pass algorithm specialized for text clustering. Encoding documents into numerical vectors for using the traditional version of single pass algorithm causes the two main problems: huge dimensionality and sparse distribution. Therefore, in order to address the two problems, this research modifies the single pass algorithm into its version where documents are encoded into not numerical vectors but other forms. In the proposed version, documents are mapped into tables and the operation on two tables is defined for using the single pass algorithm. The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it into the specialized version.

  • PDF

Gated Multi-channel Network Embedding for Large-scale Mobile App Clustering

  • Yeo-Chan Yoon;Soo Kyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.6
    • /
    • pp.1620-1634
    • /
    • 2023
  • This paper studies the task of embedding nodes with multiple graphs representing multiple information channels, which is useful in a large volume of network clustering tasks. By learning a node using multiple graphs, various characteristics of the node can be represented and embedded stably. Existing studies using multi-channel networks have been conducted by integrating heterogeneous graphs or limiting common nodes appearing in multiple graphs to have similar embeddings. Although these methods effectively represent nodes, it also has limitations by assuming that all networks provide the same amount of information. This paper proposes a method to overcome these limitations; The proposed method gives different weights according to the source graph when embedding nodes; the characteristics of the graph with more important information can be reflected more in the node. To this end, a novel method incorporating a multi-channel gate layer is proposed to weigh more important channels and ignore unnecessary data to embed a node with multiple graphs. Empirical experiments demonstrate the effectiveness of the proposed multi-channel-based embedding methods.

AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING

  • M. REKA
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1327-1339
    • /
    • 2023
  • World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web's document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

A Study of FRBR Implementation to Catalog by Using Work Clustering (저작 클러스터링 분석을 통한 FRBR의 목록 적용에 관한 연구)

  • Lee, Mi-Hwa;Chung, Yeon-Kyoung
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.3
    • /
    • pp.65-82
    • /
    • 2008
  • The purposes of this study are to explore FRBR utilities such as work clustering and expression clustering and problems of application of the FRBR by developing work and expression clustering algorithm and implementing it into cataloging system, and to suggest new cataloging rules for FRBR and guideline of MARC description to improve FRBR work clustering. FRBR was suggested by necessitation of collocation function of bibliographic records according to increase of searching materials and multi-version materials, but FRBRization has some problems such as imperfect conversion of bibliographic records to FRBR records and inappropriateness of current cataloging rules for FRBR. Bibliographic records must be processed by FRBR algorithm to construct FRBRized system, but bibliographic records and current cataloging rules couldn't perfectly support FRBRization. Therefore cataloging rules and guidelines of MARC description for FRBR are needed. For constructing FRBRized cataloging system in Korea, it is needed to find problems and solution through FRBR practical application such as developing FRBR algorithm and applying it to cataloging records.

Enhancement of the k-Means Clustering Speed by Emulation of Birds' Motion in Flock (새떼 이동의 모방에 의한 k-평균 군집 속도의 향상)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.9
    • /
    • pp.965-970
    • /
    • 2014
  • In an effort to improve the convergence speed in k-means clustering, we introduce the notion of the birds' movement in a flock. Their motion is characterized by the observation that each bird runs after his nearest neighbor. We utilize this feature in clustering procedure. Once the class of a vector is determined, then a number of vectors in the vicinity of it are assigned to the same class. Experiments have shown that the required number of iterations for termination is significantly lower in the proposed method than in the conventional one. Furthermore, the time of calculation per iteration is more than 5% shorter in the proposed case. The quality of the clustering, as determined from the total accumulated distance between the vector and its centroid vector, was found to be practically the same. It might be phrased that we may acquire practically the same clustering result with shorter computational time.