• Title/Summary/Keyword: 최적의 클러스터 수

Search Result 155, Processing Time 0.03 seconds

Adaptive Data Mining Model using Fuzzy Performance Measures (퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.541-546
    • /
    • 2006
  • Data Mining is the process of finding hidden patterns inside a large data set. Cluster analysis has been used as a popular technique for data mining. It is a fundamental process of data analysis and it has been Playing an important role in solving many problems in pattern recognition and image processing. If fuzzy cluster analysis is to make a significant contribution to engineering applications, much more attention must be paid to fundamental decision on the number of clusters in data. It is related to cluster validity problem which is how well it has identified the structure that Is present in the data. In this paper, we design an adaptive data mining model using fuzzy performance measures. It discovers clusters through an unsupervised neural network model based on a fuzzy objective function and evaluates clustering results by a fuzzy performance measure. We also present the experimental results on newsgroup data. They show that the proposed model can be used as a document classifier.

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.14B no.2
    • /
    • pp.135-140
    • /
    • 2007
  • Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

An Efficient Genetic Algorithm with Partial Evaluation by Clustering (개체 클러스터링을 이용한 효율적인 국소 평가 유전자 알고리즘)

  • 김희수;조성배
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.6-8
    • /
    • 2000
  • 유전자 알고리즘을 적용하는 문제의 경우 일반적으로 집단의 크기를 가능한 한 크게 유지시킴으로써 최적의 해가 찾아지도록 한다. 그러나 개체 평가 비용이 상대적으로 큰 몇몇 특정한 문제의 경우 집단의 크기가 커지면 심각한 문제가 되기도 한다. 이러한 이유로 본 논문에서는 클러스터링 기법을 이용한 국소 평가 유전자 알고리즘을 제안하였다. 이 방법은 집단을 몇 개의 클러스터로 나누고 각각의 대표 개체를 평가한 후 나머지 개체들의 적합도 값은 간접적인 계산에 의해 얻어내는 방법으로, 적은 수의 평가만으로도 상대적으로 큰 집단을 유지시키는 효과를 얻을 수 있다. 일반적인 유전자 알고리즘과의 성능 비교를 통해 제안된 알고리즘이 효율적이었음을 알 수 있었다.

  • PDF

Feature Weighting in Projected Clustering for High Dimensional Data (고차원 데이타에 대한 투영 클러스터링에서 특성 가중치 부여)

  • Park, Jong-Soo
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.228-242
    • /
    • 2005
  • The projected clustering seeks to find clusters in different subspaces within a high dimensional dataset. We propose an algorithm to discover near optimal projected clusters without user specified parameters such as the number of output clusters and the average cardinality of subspaces of projected clusters. The objective function of the algorithm computes projected energy, quality, and the number of outliers in each process of clustering. In order to minimize the projected energy and to maximize the quality in clustering, we start to find best subspace of each cluster on the density of input points by comparing standard deviations of the full dimension. The weighting factor for each dimension of the subspace is used to get id of probable error in measuring projected distances. Our extensive experiments show that our algorithm discovers projected clusters accurately and it is scalable to large volume of data sets.

Optimization Protocol using Load Balancing for Hierarchical Wireless Sensor Network (무선센서네트워크에서 부하 균등화를 위한 클러스터링 최적화 프로토콜)

  • Choi, Hae-Won;Kim, Sang-Jin;Pye, Su-Young;Chang, Chu-Seock
    • Journal of Digital Convergence
    • /
    • v.11 no.10
    • /
    • pp.419-429
    • /
    • 2013
  • The Wireless sensor network(WSN) consisting of a large number of sensors aims to gather data in a variety of environments. The sensor nodes operate on battery of limited power. so, To extend network life time is major goals of research in the WSN. In this paper, we state the key point of a energy consumption with minimum&load balancing. The proposed protocol guarantee balance of number of cluster member nodes using the node memory threshold and optimization of distribution of cluster head using the optimized clustering method. The results show that the proposed protocol could support the load balancing and high energy efficiency by distributing the clusters with a reasonable number of member nodes. The simulation results show that our schme ensure longer life time in WSN as compare with existing schemes such as LEACH and CBLM.

Automatic Design of Fuzzy Controller Using Clustering and Genetic Algorithm (클러스터링과 GA를 이용한 퍼지 제어기 설계 자동화)

  • Yoon, Yong-Seock;Kong, Seong-Gon
    • Proceedings of the KIEE Conference
    • /
    • 2000.07d
    • /
    • pp.2953-2955
    • /
    • 2000
  • 본 논문에서는 전문가의 지식이 없는 상황에서 자동적으로 최적의 퍼지 제어기를 설계하는 방법에 대해 연구한다. 먼저 퍼지 제어기의 규칙 설정을 위해 기존의 PID 제어기의 입출력 데이터를 클러스터링한다. 군집된 데이터들로부터 클러스터의 수를 파악하고 이를 바탕으로 퍼지 제어를 위한 규칙의 수를 결정한다. 둘째로 퍼지 제어기의 여러 파라미터들은 유전자 알고리즘을 적용하여 최적화한다. GA를 이용한 최적화 과정에서는 성능평가 기준으로 기준입력에 대한 시스템 응답간의 오차와 오버슈트의 크기를 사용하여 응답이 빠르고 안정적인 제어기를 설계하도록 진화방향을 설정한다. 이렇게 만들어진 퍼지 제어기의 성능을 기존의 PID 제어기와 비교 평가한다

  • PDF

A Text Summarization Model Based on Sentence Clustering (문장 클러스터링에 기반한 자동요약 모형)

  • 정영미;최상희
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.159-178
    • /
    • 2001
  • This paper presents an automatic text summarization model which selects representative sentences from sentence clusters to create a summary. Summary generation experiments were performed on two sets of test documents after learning the optimum environment from a training set. Centroid clustering method turned out to be the most effective in clustering sentences, and sentence weight was found more effective than the similarity value between sentence and cluster centroid vectors in selecting a representative sentence from each cluster. The result of experiments also proves that inverse sentence weight as well as title word weight for terms and location weight for sentences are effective in improving the performance of summarization.

  • PDF

An Optimal Allocation Mechanism of Location Servers in A Linear Arrangement of Base Stations (선형배열 기지국을 위한 위치정보 서버의 최적할당 방식)

  • Lim, Kyung-Shik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.426-436
    • /
    • 2000
  • Given a linear arrangement of n base stations which generate multiple types of traffic among themselves, we consider the problem of finding a set of disjoint clusters to cover n base statons so that a cluster is assigned a location server. Our goal is to minimize the total communication cost for the entire network where the cost of intra-cluster communication is usually lower than that of intercluster communication for each type of traffic. The optimization problem is transformed into an equivavalent problem using the concept of relative cost, which generates the difference of communication costs between intracluster and intercluster communications. Using the relative cost matrix, an efficient algorithm of O($mm^2$), where m is the number of clusters in a partition, is designed by dynamic programming. The algorithm also finds all thevalid partitions in the same polynomial time, given the size constraint on a cluster, and the total allowable communication cost for the entire network.

  • PDF

Analysis of the Load Balancing Algorithms according to the Request Patterns on the LVS Cluster Systems (LVS 클러스터 시스템의 요구 패턴에 따른 부하 분산 알고리즘 분석)

  • Li, Shan-Hong;Kim, Sung-Ki;Na, Yong-Hee;Min, Byoung-Joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.151-154
    • /
    • 2002
  • 갈수록 증가하는 인터넷 사용자의 서비스 요구량에 대처하기 위해, 부하 분산 기능을 갖는 클러스터 시스템의 이용이 늘어가고 있다. 본 연구에서는 클라이언트에게 보다 향상된 응답 성능을 제공하기 위해 사용되는 RR(Round Robin), WRR(Weighted Round Robin), LC(Least Connection), WLC(Weighted Least Connection) 부하 분산 알고리즘에 대해서, 클라이언트로부터 인입되는 7 가지 요구 수신 패턴에 따른 부하 분산 응답 특성을 분석하고 그 결과를 논한다. 이를 위해, 실제 시스템의 측정 결과를 토대로 단위 시간 당 인입되는 클라이언트의 요구량 변화를 7 가지 패턴으로 분류하였고, 리눅스 가상 서버(LVS: Linux Virtual Server) 클러스터 시스템을 대상으로 7 가지 요구 패턴에 대한 부하 분산 응답 특성을 얻었다. 본 연구를 통해서 클라이언트 요구랑 변화 패턴에 따른 최적의 부하 분산 알고리즘을 제시할 수 있었다. 본 연구 결과는 향후 효율적인 동적 부하 분산 연구에 좋은 참고가 될 것이다.

  • PDF

Natural Scene Text Binarization using Tensor Voting and Markov Random Field (텐서보팅과 마르코프 랜덤 필드를 이용한 자연 영상의 텍스트 이진화)

  • Choi, Hyun Su;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.18-23
    • /
    • 2015
  • In this paper, we propose a method for detecting the number of clusters. This method can improve the performance of a gaussian mixture model function in conventional markov random field method by using the tensor voting. The key point of the proposed method is that extracts the number of the center through the continuity of saliency map of the input data of the tensor voting token. At first, we separate the foreground and background region candidate in a given natural images. After that, we extract the appropriate cluster number for each separate candidate regions by applying the tensor voting. We can make accurate modeling a gaussian mixture model by using a detected number of cluster. We can return the result of natural binary text image by calculating the unary term and the pairwise term of markov random field. After the experiment, we can confirm that the proposed method returns the optimal cluster number and text binarization results are improved.