• 제목/요약/키워드: Two Phase Clustering

검색결과 48건 처리시간 0.026초

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • 제19권1호
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.

Analysis of Mechanical Response of Two-phase Polycrystalline Microstructures with Distinctive Topology of Phase Clustering (2상 다결정 미세구조의 상 분포 위상에 따른 역학적 거동 분석)

  • Chung, Sang-Yeop;Han, Tong-Seok
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • 제24권1호
    • /
    • pp.9-16
    • /
    • 2011
  • An approach to understand the phase distribution in a multi-phase polycrystalline material is important since it can affect material properties and mechanical behaviors. A proper method is needed to describe the phase distribution. For this purpose, contiguity and probability functions(two-point correlation and lineal-path functions) are investigated for representing the phase distributions of microstructures. The mechanical behaviors are evaluated using the finite element method. The characteristics of probability functions and mechanical reponses of virtual samples are represented. It is confirmed that the topology of phase clustering affects the mechanical behavior of materials and that the strength is reduced as the clustering size increases.

A Study on Web-User Clustering Algorithm for Web Personalization (웹 개인화를 위한 웹사용자 클러스터링 알고리즘에 관한 연구)

  • Lee, Hae-Kag
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제12권5호
    • /
    • pp.2375-2382
    • /
    • 2011
  • The user clustering for web navigation pattern discovery is very useful to get preference and behavior pattern of users for web pages. In addition, the information by the user clustering is very essential for web personalization or customer grouping. In this paper, an algorithm for clustering the web navigation path of users is proposed and then some special navigation patterns can be recognized by the algorithm. The proposed algorithm has two clustering phases. In the first phase, all paths are classified into k-groups on the bases of the their similarities. The initial solution obtained in the first phase is not global optimum but it gives a good and feasible initial solution for the second phase. In the second phase, the first phase solution is improved by revising the k-means algorithm. In the revised K-means algorithm, grouping the paths is performed by the hyperplane instead of the distance between a path and a group center. Experimental results show that the proposed method is more efficient.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • 제20권2호
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Two-phase Content-based Image Retrieval Using the Clustering of Feature Vector (특징벡터의 끌러스터링 기법을 통한 2단계 내용기반 이미지검색 시스템)

  • 조정원;최병욱
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • 제40권3호
    • /
    • pp.171-180
    • /
    • 2003
  • A content-based image retrieval(CBIR) system builds the image database using low-level features such as color, shape and texture and provides similar images that user wants to retrieve when the retrieval request occurs. What the user is interest in is a response time in consideration of the building time to build the index database and the response time to obtain the retrieval results from the query image. In a content-based image retrieval system, the similarity computing time comparing a query with images in database takes the most time in whole response time. In this paper, we propose the two-phase search method with the clustering technique of feature vector in order to minimize the similarity computing time. Experimental results show that this two-phase search method is 2-times faster than the conventional full-search method using original features of ail images in image database, while maintaining the same retrieval relevance as the conventional full-search method. And the proposed method is more effective as the number of images increases.

Heterogeneity-aware Energy-efficient Clustering (HEC) Technique for WSNs

  • Sharma, Sukhwinder;Bansal, Rakesh Kumar;Bansal, Savina
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권4호
    • /
    • pp.1866-1888
    • /
    • 2017
  • Efficient energy consumption in WSN is one of the key design issues for improving network stability period. In this paper, we propose a new Heterogeneity-aware Energy-efficient Clustering (HEC) technique which considers two types of heterogeneity - network lifetime and of sensor nodes. Selection of cluster head nodes is done based on the three network lifetime phases: only advanced nodes are allowed to become cluster heads in the initial phase; in the second active phase all nodes are allowed to participate in cluster head selection process with equal probability, and in the last dying out phase, clustering is relaxed by allowing direct transmission. Simulation-based performance analysis of the proposed technique as compared to other relevant techniques shows that HEC achieves longer stable region, improved throughput, and better energy dissipation owing to judicious consumption of additional energy of advanced nodes. On an average, the improvement observed for stability period over LEACH, SEP, FAIR and HEC- with SEP protocols is around 65%, 30%, 15% and 17% respectively. Further, the scalability of proposed technique is tested by varying the field size and number of sensing nodes. The results obtained are found to be quite optimistic. The impact of energy heterogeneity has also been assessed and it is found to improve the stability period though only upto a certain extent.

Approximate Fuzzy Clustering Based on Density Functions (밀도함수를 이용한 근사적 퍼지 클러스처링)

  • 권석호;손세호
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제10권4호
    • /
    • pp.285-292
    • /
    • 2000
  • In general, exploratory data analysis consists of three processes: i) assessment of clustering tendency, ii) cluster analysis, and iii) cluster validation. This analysis method requiring a number of iterations of step ii) and iii) to converge is computationally inefficient. In this paper, we propose a density function-based approximate fuzzy clustering method with a hierachical structure which consosts of two phases: Phase I is a features(i.e., number of clusters and cluster centers) extraction process based on the tendency assessment of a given data and Phase II is a standard FCM with the cluster centers intialized by the results of the Phase I. Numerical examples are presented to show the validity of the proposed clustering method.

  • PDF

Heuristic for the Pick-up and Delivery Vehicle Routing Problem: Case Study for the Remicon Truck Routing in the Metropolitan Area (배달과 수집을 수행하는 차량경로문제 휴리스틱에 관한 연구: 수도권 레미콘 운송사례)

  • Ji, Chang-Hun;Kim, Mi-Yi;Lee, Young-Hoon
    • Korean Management Science Review
    • /
    • 제24권2호
    • /
    • pp.43-56
    • /
    • 2007
  • VRP(Vehicle Routing Problem) is studied in this paper, where two different kinds of missions are to be completed. The objective is to minimize the total vehicle operating distance. A mixed integer programming formulation and a heuristic algorithm for a practical use are suggested. A heuristic algorithm consists of three phases such as clustering, constructing routes, and adjustment. In the first phase, customers are clustered so that the supply nodes are grouped with demand nodes to be served by the same vehicle. Vehicle routes are generated within the cluster in the second phase. Clusters and routes are adjusted in the third phase using the UF (unfitness) rule designed to determine the customers and the routes to be moved properly. It is shown that the suggested heuristic algorithm yields good performances within a relatively short computational time through computational experiment.

Determining the Number and the Locations of RBF Centers Using Enhanced K-Medoids Clustering and Bi-Section Search Method (보정된 K-medoids 군집화 기법과 이분 탐색기법을 이용한 RBF 네트워크의 중심 개수와 위치와 통합 결정)

  • Lee, Daewon;Lee, Jaewook
    • Journal of Korean Institute of Industrial Engineers
    • /
    • 제29권2호
    • /
    • pp.172-178
    • /
    • 2003
  • In the recent researches, a variety of ways for determining the locations of RBF centers have been proposed assuming that the number of RBF centers is known. But they have also many numerical drawbacks. We propose a new method to overcome such drawbacks. The strength of our method is to determine the locations and the number of RBF centers at the same time without any assumption about the number of RBF centers. The proposed method consists of two phases. The first phase is to determine the number and the locations of RBF centers using bi-section search method and enhanced k-medoids clustering which overcomes drawbacks of clustering algorithm. In the second phase, network weights are computed and the design of RBF network is completed. This new method is applied to several benchmark data sets. Benchmark results show that the proposed method is competitive with the previously reported approaches for center selection.

A Honey-Hive based Efficient Data Aggregation in Wireless Sensor Networks

  • Ramachandran, Nandhakumar;Perumal, Varalakshmi
    • Journal of Electrical Engineering and Technology
    • /
    • 제13권2호
    • /
    • pp.998-1007
    • /
    • 2018
  • The advent of Wireless Sensor Networks (WSN) has led to their use in numerous applications. Sensors are autonomous in nature and are constrained by limited resources. Designing an autonomous topology with criteria for economic and energy conservation is considered a major goal in WSN. The proposed honey-hive clustering consumes minimum energy and resources with minimal transmission delay compared to the existing approaches. The honey-hive approach consists of two phases. The first phase is an Intra-Cluster Min-Max Discrepancy (ICMMD) analysis, which is based on the local honey-hive data gathering technique and the second phase is Inter-Cluster Frequency Matching (ICFM), which is based on the global optimal data aggregation. The proposed data aggregation mechanism increases the optimal connectivity range of the sensor node to a considerable degree for inter-cluster and intra-cluster coverage with an improved optimal energy conservation.