• Title/Summary/Keyword: and clustering

Search Result 5,592, Processing Time 0.029 seconds

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

Identification of Plastic Wastes by Using Fuzzy Radial Basis Function Neural Networks Classifier with Conditional Fuzzy C-Means Clustering

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1872-1879
    • /
    • 2016
  • The techniques to recycle and reuse plastics attract public attention. These public attraction and needs result in improving the recycling technique. However, the identification technique for black plastic wastes still have big problem that the spectrum extracted from near infrared radiation spectroscopy is not clear and is contaminated by noise. To overcome this problem, we apply Raman spectroscopy to extract a clear spectrum of plastic material. In addition, to improve the classification ability of fuzzy Radial Basis Function Neural Networks, we apply supervised learning based clustering method instead of unsupervised clustering method. The conditional fuzzy C-Means clustering method, which is a kind of supervised learning based clustering algorithms, is used to determine the location of radial basis functions. The conditional fuzzy C-Means clustering analyzes the data distribution over input space under the supervision of auxiliary information. The auxiliary information is defined by using k Nearest Neighbor approach.

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

Refining Initial Seeds using Max Average Distance for K-Means Clustering (K-Means 클러스터링 성능 향상을 위한 최대평균거리 기반 초기값 설정)

  • Lee, Shin-Won;Lee, Won-Hee
    • Journal of Internet Computing and Services
    • /
    • v.12 no.2
    • /
    • pp.103-111
    • /
    • 2011
  • Clustering methods is divided into hierarchical clustering, partitioning clustering, and more. If the amount of documents is huge, it takes too much time to cluster them in hierarchical clustering. In this paper we deal with K-Means algorithm that is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. We propose the new method of selecting initial seeds in K-Means algorithm. In this method, the initial seeds have been selected that are positioned as far away from each other as possible.

Performance of Collaborative Filtering Agent System using Clustering for Better Recommendations (개선된 추천을 위해 클러스터링을 이용한 협동적 필터링 에이전트 시스템의 성능)

  • Hwang, Byeong-Yeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5S
    • /
    • pp.1599-1608
    • /
    • 2000
  • Automated collaborative filtering is on the verge of becoming a popular technique to reduce overloaded information as well as to solve the problems that content-based information filtering systems cannot handle. In this paper, we describe three different algorithms that perform collaborative filtering: GroupLens that is th traditional technique; Best N, the modified one; and an algorithm that uses clustering. Based on the exeprimental results using real data, the algorithm using clustering is compared with the existing representative collaborative filtering agent algorithms such as GroupLens and Best N. The experimental results indicate that the algorithms using clustering is similar to Best N and better than GroupLens for prediction accuracy. The results also demonstrate that the algorithm using clustering produces the best performance according to the standard deviation of error rate. This means that the algorithm using clustering gives the most stable and the best uniform recommendation. In addition, the algorithm using clustering reduces the time of recommendation.

  • PDF

Clustering non-stationary advanced metering infrastructure data

  • Kang, Donghyun;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.225-238
    • /
    • 2022
  • In this paper, we propose a clustering method for advanced metering infrastructure (AMI) data in Korea. As AMI data presents non-stationarity, we consider time-dependent frequency domain principal components analysis, which is a proper method for locally stationary time series data. We develop a new clustering method based on time-varying eigenvectors, and our method provides a meaningful result that is different from the clustering results obtained by employing conventional methods, such as K-means and K-centres functional clustering. Simulation study demonstrates the superiority of the proposed approach. We further apply the clustering results to the evaluation of the electricity price system in South Korea, and validate the reform of the progressive electricity tariff system.

Classification Tree-Based Feature-Selective Clustering Analysis: Case of Credit Card Customer Segmentation (분류나무를 활용한 군집분석의 입력특성 선택: 신용카드 고객세분화 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • Clustering analysis is used in various fields including customer segmentation and clustering methods such as k-means are actively applied in the credit card customer segmentation. In this paper, we summarized the input features selection method of k-means clustering for the case of the credit card customer segmentation problem, and evaluated its feasibility through the analysis results. By using the label values of k-means clustering results as target features of a decision tree classification, we composed a method for prioritizing input features using the information gain of the branch. It is not easy to determine effectiveness with the clustering effectiveness index, but in the case of the CH index, cluster effectiveness is improved evidently in the method presented in this paper compared to the case of randomly determining priorities. The suggested method can be used for effectiveness of actively used clustering analysis including k-means method.

THE FUZZY CLUSTERING ALGORITHM AND SELF-ORGANIZING NEURAL NETWORKS TO IDENTIFY POTENTIALLY FAILING BANKS

  • Lee, Gi-Dong
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.485-493
    • /
    • 2005
  • Using 1991 FDIC financial statement data, we develop fuzzy clusters of the data set. We also identify the distinctive characteristics of the fuzzy clustering algorithm and compare the closest hard-partitioning result of the fuzzy clustering algorithm with the outcomes of two self-organizing neural networks. When nine clusters are used, our analysis shows that the fuzzy clustering method distinctly groups failed and extreme performance banks from control (healthy) banks. The experimental results also show that the fuzzy clustering method and the self-organizing neural networks are promising tools in identifying potentially failing banks.

  • PDF

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.