• Title/Summary/Keyword: over-clustering

Search Result 389, Processing Time 0.032 seconds

Practical Privacy-Preserving DBSCAN Clustering Over Horizontally Partitioned Data (다자간 환경에서 프라이버시를 보호하는 효율적인 DBSCAN 군집화 기법)

  • Kim, Gi-Sung;Jeong, Ik-Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.3
    • /
    • pp.105-111
    • /
    • 2010
  • We propose a practical privacy-preserving clustering protocol over horizontally partitioned data. We extend the DBSCAN clustering algorithm into a distributed protocol in which data providers mix real data with fake data to provide privacy. Our privacy-preserving clustering protocol is very efficient whereas the previous privacy-preserving protocols in the distributed environments are not practical to be used in real applications. The efficiency of our privacy-preserving clustering protocol over horizontally partitioned data is comparable with those of privacy-preserving clustering protocols in the non-distributed environments.

Design of Radial Basis Function with the Aid of Fuzzy KNN and Conditional FCM (퍼지 kNN과 Conditional FCM을 이용한 퍼지 RBF의 설계)

  • Roh, Seok-Beon;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.6
    • /
    • pp.1223-1229
    • /
    • 2009
  • The performance of Radial Basis Function Neural Networks depends on setting up the Radial Basis Functions over the input space which are the important design procedure of Radial Basis Function Neural Networks. The existing method to initialize the location of the radial basis functions over the input space is to use the conditional fuzzy C-means clustering. However, the researchers which are interested in the conditional fuzzy C-means clustering cannot get as good modeling performance as they expect because the conditional fuzzy C-means clustering cannot project the information which is extracted over the output space into the input space. To compensate the above mentioned drawback of the conditional fuzzy C-means clustering, we apply a fuzzy K-nearest neighbors approach to project the auxiliary information defined over the output space into the input space without lose of the information.

Local Distribution Based Density Clustering for Speaker Diarization (화자분할을 위한 지역적 특성 기반 밀도 클러스터링)

  • Rho, Jinsang;Shon, Suwon;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.303-309
    • /
    • 2015
  • Speaker diarization is the task of determining the speakers for unlabeled data, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been widely used in the field of speaker diarization for its simplicity and computational efficiency. One challenging issue, however, is that if different clusters in non-spatial dataset are adjacent to each other, over-clustering may occur which subsequently degrades the performance of DBSCAN. In this paper, we identify the drawbacks of DBSCAN and propose a new density clustering algorithm based on local distribution property around object. Variable density criterions for local density and spreadness of object are used for effective data clustering. We compare the proposed algorithm to DBSCAN in terms of clustering accuracy. Experimental results confirm that the proposed algorithm exhibits higher accuracy than DBSCAN without over-clustering and confirm that the new approach based on local density and object spreadness is efficient.

Mobility-Based Clustering Algorithm for Multimedia Broadcasting over IEEE 802.11p-LTE-enabled VANET

  • Syfullah, Mohammad;Lim, Joanne Mun-Yee;Siaw, Fei Lu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1213-1237
    • /
    • 2019
  • Vehicular Ad-hoc Network (VANET) facilities envision future Intelligent Transporting Systems (ITSs) by providing inter-vehicle communication for metrics such as road surveillance, traffic information, and road condition. In recent years, vehicle manufacturers, researchers and academicians have devoted significant attention to vehicular communication technology because of its highly dynamic connectivity and self-organized, decentralized networking characteristics. However, due to VANET's high mobility, dynamic network topology and low communication coverage, dissemination of large data packets (e.g. multimedia content) is challenging. Clustering enhances network performance by maintaining communication link stability, sharing network resources and efficiently using bandwidth among nodes. This paper proposes a mobility-based, multi-hop clustering algorithm, (MBCA) for multimedia content broadcasting over an IEEE 802.11p-LTE-enabled hybrid VANET architecture. The OMNeT++ network simulator and a SUMO traffic generator are used to simulate a network scenario. The simulation results indicate that the proposed clustering algorithm over a hybrid VANET architecture improves the overall network stability and performance, resulting in an overall 20% increased cluster head duration, 20% increased cluster member duration, lower cluster overhead, 15% improved data packet delivery ratio and lower network delay from the referenced schemes [46], [47] and [50] during multimedia content dissemination over VANET.

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

Sub-class Clustering of Land Cover over Asia considering 9-year NDVI and Climate Data

  • Lee, Ga-Lam;Han, Kyung-Soo;Kim, Do-Yong
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.3
    • /
    • pp.289-301
    • /
    • 2011
  • In this paper an attempt has been made to classify Asia land cover considering climatic and vegetative characteristics. The sub-class clustering based on the 13 MODIS land cover classes (except water) over Asia was performed with the climate map and the NOVI derived from SPOT 5 VGT D10 data. The unsupervised classification for the sub-class clustering was performed in each land cover class, and total 74 clusters were determined over the study area. Via these clusters, the annual variations (from 1999 to 2007) of precipitation rate and temperature were analyzed as an example by a simple linear regression model. The various annual variations (negative or positive pattern) were represented for each cluster because of the various climate zones and NOVI annual cycles. Therefore, the detailed land cover map as the classification result by the sub-class clustering in this study can be useful information in modelling works for requiring the detailed climatic and vegetative information as a boundary condition.

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF