• Title/Summary/Keyword: and clustering

Search Result 5,621, Processing Time 0.032 seconds

A Comparative Study on Clustering Methods for Grouping Related Tags (연관 태그의 군집화를 위한 클러스터링 기법 비교 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.399-416
    • /
    • 2009
  • In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

A Study on Efficient Classification of Pattern Using Object Oriented Relationship between Design Patterns

  • Kim Gui-Jung;Han Jung-Soo
    • International Journal of Contents
    • /
    • v.2 no.3
    • /
    • pp.11-17
    • /
    • 2006
  • The Clustering is representative method of components classification. The previous clustering methods that use cohesion and coupling cannot be effective because design pattern has focused on relation between classes. In this paper, we classified design patterns with features of object-oriented relationship. The result is that classification by clustering showed higher precision than classification by facet. It is effective that design patterns are classified by automatic clustering algorithm. When patterns are retrieved in classification of design patterns, we can use to compare them because similar pattern is saved to same category. Also we can manage repository efficiently because of storing patterns with link information.

  • PDF

A Study on Extracting Customer Emotions from Blog and Clustering for Target Marketing (고객 Clustering을 위한 Blog 감성 추출에 대한 연구)

  • Bae, Sang-Keun;Kang, Jae-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.403-406
    • /
    • 2008
  • Blog는 개인의 여러 미묘한 감정과 감성들을 표현하고, 이를 소통하는 Communication Channel 역할을 하고 있으며, 또한 누구나 접근할 수 있게 되었다. 이는 각 기업에게, 기존의 비효율적인 Mass Marketing 방법에서 벗어나, 소비자의 감성을 자연스럽게 추출하여 세련된 Target Marketing을 할 수 있는 훌륭한 기회를 제공하게 된다. 하지만, 고객의 Blog로 부터 미묘한 감성지수를 추출하고, 이를 마케팅 방법에 접목시키는 것은 쉽지 않은 일이다. 이러한 문제를 해결하기 위해서 본 논문에서는 고객 회원정보에 등록된 Blog를 이용하여, Target Marketing에 활용할 수 있는, 고객 Clustering을 위한 Blog 감성 추출에 대한 연구를 수행하였다. Blog의 Main Skin Image를 통해 지배적인 채도와 명도를 추출하여 수치화하고, 이를 바탕으로 고객 Blog를 테이스트 스케일법 (*일본감성연구소 개발방법)의 실증된 감성 Group 별로 Clustering 하였다. Clustering 된 각 Blog 사용자를 대상으로 연관 배색에 대한 감성 설문조사를 실시한 결과, 유의한 실험결과가 도출되어 향후 고객 감성을 기반으로 한 Target Marketing에 활용할 수 있는 가능성을 볼 수 있었다.

A Study on Density-Based Clustering Method Considering Directionality (방향성을 고려한 밀도 기반 클러스터링 기법에 관한 연구)

  • Jinman Kim;Joongjin Kook
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.2
    • /
    • pp.38-44
    • /
    • 2024
  • This research proposed DBSCAN-D, which is a clustering technique for locating POI based on existing density-based clustering research, such as GPS data, generated by moving objects. This method is designed based on 'staying time' and 'directionality' extracted from the relationship between GPS data. The staying time can be extracted through the difference in the reception time between data using the time at which the GPS data is received. Directionality can be expressed by moving the area of data generated later in the direction of the position of the previously generated data by concentrating on the point where the GPS data is sequentially generated. Through these two properties, it is possible to perform clustering suitable for the data set generated by the moving object.

  • PDF

i-LEACH : Head-node Constrained Clustering Algorithm for Randomly-Deployed WSN (i-LEACH : 랜덤배치 고정형 WSN에서 헤더수 고정 클러스터링 알고리즘)

  • Kim, Chang-Joon;Lee, Doo-Wan;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.198-204
    • /
    • 2012
  • Generally, the clustering of sensor nodes in WSN is a useful mechanism that helps to cope with scalability problem and, if combined with network data aggregation, may increase the energy efficiency of the network. The Hierarchical clustering routing algorithm is a typical algorithm for enhancing overall energy efficiency of network, which selects cluster-head in order to send the aggregated data arriving from the node in cluster to a base station. In this paper, we propose the improved-LEACH that uses comparably simple and light-weighted policy to select cluster-head nodes, which results in reduction of the clustering overhead and overall power consumption of network. By using fine-grained power model, the simulation results show that i-LEACH can reduce clustering overhead compared with the well-known previous works such as LEACH. As result, i-LEACH algorithm and LEACH algorithm was compared, network power-consumption of i-LEACH algorithm was improved than LEACH algorithm with 25%, and network-traffic was improved 16%.

Movie Recommendation Using Co-Clustering by Infinite Relational Models (Infinite Relational Model 기반 Co-Clustering을 이용한 영화 추천)

  • Kim, Byoung-Hee;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.4
    • /
    • pp.443-449
    • /
    • 2014
  • Preferences of users on movies are observables of various factors that are related with user attributes and movie features. For movie recommendation, analysis methods for relation among users, movies, and preference patterns are mandatory. As a relational analysis tool, we focus on the Infinite Relational Model (IRM) which was introduced as a tool for multiple concept search. We show that IRM-based co-clustering on preference patterns and movie descriptors can be used as the first tool for movie recommender methods, especially content-based filtering approaches. By introducing a set of well-defined tag sets for movies and doing three-way co-clustering on a movie-rating matrix and a movie-tag matrix, we discovered various explainable relations among users and movies. We suggest various usages of IRM-based co-clustering, espcially, for incremental and dynamic recommender systems.

Property-based Hierarchical Clustering of Peers using Mobile Agent for Unstructured P2P Systems (비구조화 P2P 시스템에서 이동에이전트를 이용한 Peer의 속성기반 계층적 클러스터링)

  • Salvo, MichaelAngelG.;Mateo, RomeoMarkA.;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.189-198
    • /
    • 2009
  • Unstructured peer-to-peer systems are most commonly used in today's internet. But file placement is random in these systems and no correlation exists between peers and their contents. There is no guarantee that flooding queries will find the desired data. In this paper, we propose to cluster nodes in unstructured P2P systems using the agglomerative hierarchical clustering algorithm to improve the search method. We compared the delay time of clustering the nodes between our proposed algorithm and the k-means clustering algorithm. We also simulated the delay time of locating data in a network topology and recorded the overhead of the system using our proposed algorithm, k-means clustering, and without clustering. Simulation results show that the delay time of our proposed algorithm is shorter compared to other methods and resource overhead is also reduced.

  • PDF

Fuzzy Relevance-Based Clustering for Routing Performance Enhancement in Wireless Ad-Hoc Networks (무선 애드 혹 네트워크상에서 라우팅 성능 향상을 위한 퍼지 적합도 기반 클러스터링)

  • Lee, Chong-Deuk
    • Journal of Advanced Navigation Technology
    • /
    • v.14 no.4
    • /
    • pp.495-503
    • /
    • 2010
  • The clustering is an important mechanism thai provides information for mobile nodes efficiently and improves the processing capacity for routing and the allocation of bandwidth. This paper proposes a clustering scheme based on the fuzzy relevance degree to solve problems such as node distribution found in the dynamic property due to mobility and flat structure and to enhance the routing performance. The proposed scheme uses the fuzzy relevance degree, ${\alpha}$, to select the cluster head for clustering in FSV (Fuzzy State Viewing) structure. The fuzzy relevance ${\alpha}$ plays the role in CH selection that processes the clustering in FSV. The proposed clustering scheme is used to solve problems found in existing 1-hop and 2-hop clustering schemes. NS-2 simulator is used to verify the performance of the proposed scheme by simulation. In the simulation the proposed scheme is compared with schemes such as Lowest-ID, MOBIC, and SCA. The simulation result showed that the proposed scheme has better performance than the other existing compared schemes.

Gene Expression Data Analysis Using Seed Clustering (시드 클러스터링 방법에 의한 유전자 발현 데이터 분석)

  • Shin Myoung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Cluster analysis of microarray data has been often used to find biologically relevant Broups of genes based on their expression levels. Since many functionally related genes tend to be co-expressed, by identifying groups of genes with similar expression profiles, the functionalities of unknown genes can be inferred from those of known genes in the same group. In this Paper we address a novel clustering approach, called seed clustering, and investigate its applicability for microarray data analysis. In the seed clustering method, seed genes are first extracted by computational analysis of their expression profiles and then clusters are generated by taking the seed genes as prototype vectors for target clusters. Since it has strong mathematical foundations, the seed clustering method produces the stable and consistent results in a systematic way. Also, our empirical results indicate that the automatically extracted seed genes are well representative of potential clusters hidden in the data, and that its performance is favorable compared to current approaches.