• Title/Summary/Keyword: Subjective clustering

Search Result 45, Processing Time 0.023 seconds

Empirical Comparisons of Clustering Algorithms using Silhouette Information

  • Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.

Fuzzy Clustering Model using Principal Components Analysis and Naive Bayesian Classifier (주성분 분석과 나이브 베이지안 분류기를 이용한 퍼지 군집화 모형)

  • Jun, Sung-Hae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.485-490
    • /
    • 2004
  • In data representation, the clustering performs a grouping process which combines given data into some similar clusters. The various similarity measures have been used in many researches. But, the validity of clustering results is subjective and ambiguous, because of difficulty and shortage about objective criterion of clustering. The fuzzy clustering provides a good method for subjective clustering problems. It performs clustering through the similarity matrix which has fuzzy membership value for assigning each object. In this paper, for objective fuzzy clustering, the clustering algorithm which joins principal components analysis as a dimension reduction model with bayesian learning as a statistical learning theory. For performance evaluation of proposed algorithm, Iris and Glass identification data from UCI Machine Learning repository are used. The experimental results shows a happy outcome of proposed model.

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu;Congcong Zhao;Hao Chao
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.778-790
    • /
    • 2023
  • Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Improvement of Support Vector Clustering using Evolutionary Programming and Bootstrap

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.3
    • /
    • pp.196-201
    • /
    • 2008
  • Statistical learning theory has three analytical tools which are support vector machine, support vector regression, and support vector clustering for classification, regression, and clustering respectively. In general, their performances are good because they are constructed by convex optimization. But, there are some problems in the methods. One of the problems is the subjective determination of the parameters for kernel function and regularization by the arts of researchers. Also, the results of the learning machines are depended on the selected parameters. In this paper, we propose an efficient method for objective determination of the parameters of support vector clustering which is the clustering method of statistical learning theory. Using evolutionary algorithm and bootstrap method, we select the parameters of kernel function and regularization constant objectively. To verify improved performances of proposed research, we compare our method with established learning algorithms using the data sets form ucr machine learning repository and synthetic data.

Research on the Hybrid Paragraph Detection System Using Syntactic-Semantic Analysis (구문의미 분석을 활용한 복합 문단구분 시스템에 대한 연구)

  • Kang, Won Seog
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.106-116
    • /
    • 2021
  • To increase the quality of the system in the subjective-type question grading and document classification, we need the paragraph detection. But it is not easy because it is accompanied by semantic analysis. Many researches on the paragraph detection solve the detection problem using the word based clustering method. However, the word based method can not use the order and dependency relation between words. This paper suggests the paragraph detection system using syntactic-semantic relation between words with the Korean syntactic-semantic analysis. This system is the hybrid system of word based, concept based, and syntactic-semantic tree based detection. The experiment result of the system shows it has the better result than the word based system. This system will be utilized in Korean subjective question grading and document classification.

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

A Study on Three Phase Partial Discharge Pattern Classification with the Aid of Optimized Polynomial Radial Basis Function Neural Networks (최적화된 pRBF 뉴럴 네트워크에 이용한 삼상 부분방전 패턴분류에 관한 연구)

  • Oh, Sung-Kwun;Kim, Hyun-Ki;Kim, Jung-Tae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.4
    • /
    • pp.544-553
    • /
    • 2013
  • In this paper, we propose the pattern classifier of Radial Basis Function Neural Networks(RBFNNs) for diagnosis of 3-phase partial discharge. Conventional methods map the partial discharge/noise data on 3-PARD map, and decide whether the partial discharge occurs or not from 3-phase or neutral point. However, it is decided based on his own subjective knowledge of skilled experter. In order to solve these problems, the mapping of data as well as the classification of phases are considered by using the general 3-PARD map and PA method, and the identification of phases occurring partial discharge/noise discharge is done. In the sequel, the type of partial discharge occurring on arbitrary random phase is classified and identified by fuzzy clustering-based polynomial Radial Basis Function Neural Networks(RBFNN) classifier. And by identifying the learning rate, momentum coefficient, and fuzzification coefficient of FCM fuzzy clustering with the aid of PSO algorithm, the RBFNN classifier is optimized. The virtual simulated data and the experimental data acquired from practical field are used for performance estimation of 3-phase partial discharge pattern classifier.

NOGSEC: A NOnparametric method for Genome SEquence Clustering (녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • Korean Journal of Microbiology
    • /
    • v.39 no.2
    • /
    • pp.67-75
    • /
    • 2003
  • One large topic in comparative genomics is to predict functional annotation by classifying protein sequences. Computational approaches for function prediction include protein structure prediction, sequence alignment and domain prediction or binding site prediction. This paper is on another computational approach searching for sets of homologous sequences from sequence similarity graph. Methods based on similarity graph do not need previous knowledges about sequences, but largely depend on the researcher's subjective threshold settings. In this paper, we propose a genome sequence clustering method of iterative testing and graph decomposition, and a simple method to calculate a strict threshold having biochemical meaning. Proposed method was applied to known bacterial genome sequences and the result was shown with the BAG algorithm's. Result clusters are lacking some completeness, but the confidence level is very high and the method does not need user-defined thresholds.

Fuzzy Clustering of Fuzzy Data using a Dissimilarity Measure (비유사도 척도를 이용한 퍼지 데이터에 대한 퍼지 클러스터링)

  • Lee, Geon-Myeong
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.9
    • /
    • pp.1114-1124
    • /
    • 1999
  • 클러스터링은 동일한 클러스터에 속하는 데이타들 간에는 유사도가 크도록 하고 다른 클러스터에 속하는 데이타들 간에는 유사도가 작도록 주어진 데이타를 몇 개의 클러스터로 묶는 것이다. 어떤 대상을 기술하는 데이타는 수치 속성뿐만 아니라 정성적인 비수치 속성을 갖게 되고, 이들 속성값은 관측 오류, 불확실성, 주관적인 판정 등으로 인해서 정확한 값으로 주어지지 않고 애매한 값으로 주어지는 경우가 많다. 본 논문에서는 애매한 값을 퍼지값으로 표현하는 수치 속성과 비수치 속성을 포함한 데이타에 대한 비유사도 척도를 제안하고, 이 척도를 이용하여 퍼지값을 포함한 데이타에 대하여 퍼지 클러스터링하는 방법을 소개한 다음, 이를 이용한 실험 결과를 보인다. Abstract The objective of clustering is to group a set of data into some number of clusters in a way to minimize the similarity between data belonging to different clusters and to maximize the similarity between data belonging to the same cluster. Many data for real world objects consist of numeric attributes and non-numeric attributes whose values are fuzzily described due to observation error, uncertainty, subjective judgement, and so on. This paper proposes a dissimilarity measure applicable to such data and then introduces a fuzzy clustering method for such data using the proposed dissimilarity measure. It also presents some experiment results to show the applicability of the proposed clustering method and dissimilarity measure.

A Hybrid Recommendation System based on Fuzzy C-Means Clustering and Supervised Learning

  • Duan, Li;Wang, Weiping;Han, Baijing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.7
    • /
    • pp.2399-2413
    • /
    • 2021
  • A recommendation system is an information filter tool, which uses the ratings and reviews of users to generate a personalized recommendation service for users. However, the cold-start problem of users and items is still a major research hotspot on service recommendations. To address this challenge, this paper proposes a high-efficient hybrid recommendation system based on Fuzzy C-Means (FCM) clustering and supervised learning models. The proposed recommendation method includes two aspects: on the one hand, FCM clustering technique has been applied to the item-based collaborative filtering framework to solve the cold start problem; on the other hand, the content information is integrated into the collaborative filtering. The algorithm constructs the user and item membership degree feature vector, and adopts the data representation form of the scoring matrix to the supervised learning algorithm, as well as by combining the subjective membership degree feature vector and the objective membership degree feature vector in a linear combination, the prediction accuracy is significantly improved on the public datasets with different sparsity. The efficiency of the proposed system is illustrated by conducting several experiments on MovieLens dataset.