• Title/Summary/Keyword: and clustering

Search Result 5,621, Processing Time 0.029 seconds

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

  • Park, Min Young;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.511-526
    • /
    • 2020
  • Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.

The Experimental Study on the Relationship between Hierarchical Agglomerative Clustering and Compound Nouns Indexing (계층적 결합형 문서 클러스터링 시스템과 복합명사 색인방법과의 연관관계 연구)

  • Cho Hyun-Yang;Choi Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.4
    • /
    • pp.179-192
    • /
    • 2004
  • In this paper, we present that the result of document clustering can change dramatically with respect to the different ways of indexing compound nouns. First of all, the automatic indexing engine specialized for Korean words analysis, which also serves as the backbone engine for automatic document clustering system, is introduced. Then, the details of hierarchical agglomerative clustering(HAC) method, one of the widely used clustering methodologies in these days, was illustrated. As the result of observing the experiments, carried out in the final part of this paper, it comes to the conclusion that the various modes of indexing compound nouns have an effect on the outcome of HAC.

Practical Privacy-Preserving DBSCAN Clustering Over Horizontally Partitioned Data (다자간 환경에서 프라이버시를 보호하는 효율적인 DBSCAN 군집화 기법)

  • Kim, Gi-Sung;Jeong, Ik-Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.3
    • /
    • pp.105-111
    • /
    • 2010
  • We propose a practical privacy-preserving clustering protocol over horizontally partitioned data. We extend the DBSCAN clustering algorithm into a distributed protocol in which data providers mix real data with fake data to provide privacy. Our privacy-preserving clustering protocol is very efficient whereas the previous privacy-preserving protocols in the distributed environments are not practical to be used in real applications. The efficiency of our privacy-preserving clustering protocol over horizontally partitioned data is comparable with those of privacy-preserving clustering protocols in the non-distributed environments.

Design and Development of Clustering Algorithm Considering Influences of Spatial Objects (공간객체의 영향력을 고려한 클러스터링 알고리즘의 설계와 구현)

  • Kim, Byung-Cheol
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.113-120
    • /
    • 2006
  • This paper proposes DBSCAN-SI that is an algorithm for clustering with influences of spatial objects. DBSCAN-SI that is extended from existing DBSCAN and DBSCAN-W converts from non-spatial properties to the influences of spatial objects during the spatial clustering. It increases probability of inclusion to the cluster according to the higher the influences that is affected by the properties used in clustering and executes the clustering not only respect the spatial distances, but also volume of influences. For the perspective of specific property-centered, the clustering technique proposed in this paper can makeup the disadvantage of existing algorithms that exclude the objects in spite of high influences from cluster by means of being scarcely close objects around the cluster.

  • PDF

Color Image Segmentation Using Anisotropic Diffusion and Agglomerative Hierarchical Clustering (비등방형 확산과 계층적 클러스터링을 이용한 칼라 영상분할)

  • 김대희;안충현;호요성
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.377-380
    • /
    • 2003
  • A new color image segmentation scheme is presented in this paper. The proposed algorithm consists of image simplification, region labeling and color clustering. The vector-valued diffusion process is performed in the perceptually uniform LUV color space. We present a discrete 3-D diffusion model for easy implementation. The statistical characteristics of each labeled region are employed to estimate the number of total clusters and agglomerative hierarchical clustering is performed with the estimated number of clusters. Since the proposed clustering algorithm counts each region as a unit, it does not generate oversegmentation along region boundaries.

  • PDF

A Study on Multi-Dimensional Entity Clustering Using the Objective Function of Centroids (중심체 목적함수를 이용한 다차원 개체 CLUSTERING 기법에 관한 연구)

  • Rhee, Chul;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.15 no.2
    • /
    • pp.1-15
    • /
    • 1990
  • A mathematical definition of the cluster is suggested. A nonlinear 0-1 integer programming formulation for the multi-dimensional entity clustering problem is developed. A heuristic method named MDEC (Multi-Dimensional Entity Clustering) using centroids and the binary partition is developed and the numerical examples are shown. This method has an advantage of providing bottle-neck entity informations.

  • PDF

A study on electricity demand forecasting based on time series clustering in smart grid (스마트 그리드에서의 시계열 군집분석을 통한 전력수요 예측 연구)

  • Sohn, Hueng-Goo;Jung, Sang-Wook;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.193-203
    • /
    • 2016
  • This paper forecasts electricity demand as a critical element of a demand management system in Smart Grid environment. We present a prediction method of using a combination of predictive values by time series clustering. Periodogram-based normalized clustering, predictive analysis clustering and dynamic time warping (DTW) clustering are proposed for time series clustering methods. Double Seasonal Holt-Winters (DSHW), Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components (TBATS), Fractional ARIMA (FARIMA) are used for demand forecasting based on clustering. Results show that the time series clustering method provides a better performances than the method using total amount of electricity demand in terms of the Mean Absolute Percentage Error (MAPE).

Semantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map

  • Dumlao, Menchita F.;Oh, Byung-Joo
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.217-224
    • /
    • 2008
  • This paper provides a framework for semantic correspondence of heterogeneous databases using self- organizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to clustering using edit distance algorithm, principal component analysis (PCA), and normalization function to identify the features necessary for clustering.

  • PDF

Simultaneous Approach to Fuzzy Clustering and Quantification of Categorical Data with Missing Values

  • Honda, Katsuhiro;Nakamura, Yoshihito;Ichihashi, Hidetomo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.36-39
    • /
    • 2003
  • This paper proposes a simultaneous application of homogeneity analysis and fuzzy clustering with in complete data. Taking the similarity between the loss of homogeneity in homogeneity analysis and the least squares criterion in principal component analysis into account, the new objective function is defined in a similar formulation to the linear fuzzy clustering with missing values. Numerical experiment shows the characteristic properties of the proposed method.

  • PDF

Fuzzy k-Means Local Centers of the Social Networks

  • Woo, Won-Seok;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.213-217
    • /
    • 2012
  • Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.