• Title/Summary/Keyword: Consensus clustering

Search Result 17, Processing Time 0.021 seconds

Consensus Clustering for Time Course Gene Expression Microarray Data

  • Kim, Seo-Young;Bae, Jong-Sung
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.335-348
    • /
    • 2005
  • The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Recently, the time course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. For the data, biologists are attempting to group genes based on the temporal pattern of their expression levels. We apply the consensus clustering algorithm to a time course gene expression data in order to infer statistically meaningful information from the measurements. We evaluate each of consensus clustering and existing clustering methods with various validation measures. In this paper, we consider hierarchical clustering and Diana of existing methods, and consensus clustering with hierarchical clustering, Diana and mixed hierachical and Diana methods and evaluate their performances on a real micro array data set and two simulated data sets.

The Similarity Plot for Comparing Clustering Methods (군집분석 방법들을 비교하기 위한 상사그림)

  • Jang, Dae-Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.361-373
    • /
    • 2013
  • There are a wide variety of clustering algorithms; subsequently, we need a measure of similarity between two clustering methods. Such a measure can compare how well different clustering algorithms perform on a set of data. More numbers of compared clustering algorithms allow for more number of valuers for a measure of similarity between two clustering methods. Thus, we need a simple tool that presents the many values of a measure of similarity to compare many clustering methods. We suggest some graphical tools to compareg many clustering methods.

Fast Outlier Removal for Image Registration based on Modified K-means Clustering

  • Soh, Young-Sung;Qadir, Mudasar;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.9-14
    • /
    • 2015
  • Outlier detection and removal is a crucial step needed for various image processing applications such as image registration. Random Sample Consensus (RANSAC) is known to be the best algorithm so far for the outlier detection and removal. However RANSAC requires a cosiderable computation time. To drastically reduce the computation time while preserving the comparable quality, a outlier detection and removal method based on modified K-means is proposed. The original K-means was conducted first for matching point pairs and then cluster merging and member exclusion step are performed in the modification step. We applied the methods to various images with highly repetitive patterns under several geometric distortions and obtained successful results. We compared the proposed method with RANSAC and showed that the proposed method runs 3~10 times faster than RANSAC.

A new clustering algorithm based on the connected region generation

  • Feng, Liuwei;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2619-2643
    • /
    • 2018
  • In this paper, a new clustering algorithm based on the connected region generation (CRG-clustering) is proposed. It is an effective and robust approach to clustering on the basis of the connectivity of the points and their neighbors. In the new algorithm, a connected region generating (CRG) algorithm is developed to obtain the connected regions and an isolated point set. Each connected region corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. Then, a region expansion strategy and a consensus criterion are used to deal with the points in the isolated point set. Experimental results on the synthetic datasets and the real world datasets show that the proposed algorithm has high performance and is insensitive to noise.

Subphenotypes of Acute Respiratory Distress Syndrome: Advancing towards Precision Medicine

  • Andrea R. Levine;Carolyn S. Calfee
    • Tuberculosis and Respiratory Diseases
    • /
    • v.87 no.1
    • /
    • pp.1-11
    • /
    • 2024
  • Acute respiratory distress syndrome (ARDS) is a common cause of severe hypoxemia defined by the acute onset of bilateral non-cardiogenic pulmonary edema. The diagnosis is made by defined consensus criteria. Supportive care, including prevention of further injury to the lungs, is the only treatment that conclusively improves outcomes. The inability to find more advanced therapies is due, in part, to the highly sensitive but relatively non-specific current syndromic consensus criteria, combining a heterogenous population of patients under the umbrella of ARDS. With few effective therapies, the morality rate remains 30% to 40%. Many subphenotypes of ARDS have been proposed to cluster patients with shared combinations of observable or measurable traits. Subphenotyping patients is a strategy to overcome heterogeneity to advance clinical research and eventually identify treatable traits. Subphenotypes of ARDS have been proposed based on radiographic patterns, protein biomarkers, transcriptomics, and/or machine-based clustering of clinical and biological variables. Some of these strategies have been reproducible across patient cohorts, but at present all have practical limitations to their implementation. Furthermore, there is no agreement on which strategy is the most appropriate. This review will discuss the current strategies for subphenotyping patients with ARDS, including the strengths and limitations, and the future directions of ARDS subphenotyping.

Mining Approximate Sequential Patterns in a Large Sequence Database (대용량 순차 데이터베이스에서 근사 순차패턴 탐색)

  • Kum Hye-Chung;Chang Joong-Hyuk
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.199-206
    • /
    • 2006
  • Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns shared by many sequences. In this paper, to overcome these problems, we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. The proposed method works in two steps: one is to cluster target sequences by their similarities and the other is to find consensus patterns that ire similar to the sequences in each cluster directly through multiple alignment. For this purpose, a novel structure called weighted sequence is presented to compress the alignment result, and the longest consensus pattern that represents each cluster is generated from its weighted sequence. Finally, the effectiveness of the proposed method is verified by a set of experiments.

Identifying potential mergers of globular clusters: a machine-learning approach

  • Pasquato, Mario
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.39 no.2
    • /
    • pp.89-89
    • /
    • 2014
  • While the current consensus view holds that galaxy mergers are commonplace, it is sometimes speculated that Globular Clusters (GCs) may also have undergone merging events, possibly resulting in massive objects with a strong metallicity spread such as Omega Centauri. Galaxies are mostly far, unresolved systems whose mergers are most likely wet, resulting in observational as well as modeling difficulties, but GCs are resolved into stars that can be used as discrete dynamical tracers, and their mergers might have been dry, therefore easily simulated with an N-body code. It is however difficult to determine the observational parameters best suited to reveal a history of merging based on the positions and kinematics of GC stars, if evidence of merging is at all observable. To overcome this difficulty, we investigate the applicability of supervised and unsupervised machine learning to the automatic reconstruction of the dynamical history of a stellar system. In particular we test whether statistical clustering methods can classify simulated systems into monolithic versus merger products. We run direct N-body simulations of two identical King-model clusters undergoing a head-on collision resulting in a merged system, and other simulations of isolated King models with the same total number of particles as the merged system. After several relaxation times elapse, we extract a sample of snapshots of the sky-projected positions of particles from each simulation at different dynamical times, and we run a variety of clustering and classification algorithms to classify the snapshots into two subsets in a relevant feature space.

  • PDF

Assessment of Population Structure and Genetic Diversity of 15 Chinese Indigenous Chicken Breeds Using Microsatellite Markers

  • Chen, Guohong;Bao, Wenbin;Shu, Jingting;Ji, Congliang;Wang, Minqiang;Eding, Herwin;Muchadeyi, Farai;Weigend, Steffen
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.3
    • /
    • pp.331-339
    • /
    • 2008
  • The genetic structure and diversity of 15 Chinese indigenous chicken breeds was investigated using 29 microsatellite markers. The total number of birds examined was 542, on average 36 birds per breed. A total of 277 alleles (mean number 9.55 alleles per locus, ranging from 2 to 25) was observed. All populations showed high levels of heterozygosity with the lowest estimate of 0.440 for the Gushi chickens, and the highest one of 0.644 observed for Wannan Three-yellow chickens. The global heterozygote deficit across all populations (FIT) amounted to 0.180 (p<0.001). About 16% of the total genetic variability originated from differences between breeds, with all loci contributing significantly to this differentiation. An unrooted consensus tree was constructed using the Neighbour-Joining method and pair-wise distances based on marker estimated kinships. Two main groups were found. The heavy-body type populations grouped together in one cluster while the light-body type populations formed the second cluster. The STRUCTURE software was used to assess genetic clustering of these chicken breeds. Similar to the phylogenetic analysis, the heavy-body type and light-body type populations separated first. Clustering analysis provided an accurate representation of the current genetic relations among the breeds. Remarkably similar breed rankings were obtained with all methods.

Evaluation of Genetic Diversity among Soybean Genotypes Using SSR and SNP

  • Lee, Suk-Ha;P. Tanya;O, Srinives;T. Toojinda;A. Vanavichit;Ha, Bo-Keun;Bae, Jeong-Suk;Moon, Jung-Kyung
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.46 no.4
    • /
    • pp.334-340
    • /
    • 2001
  • Two different types of molecular markers, simple sequence repeat (SSR) and single nucleotide polymorphism (SNP), were used to measure genetic diversity among five Korean, eight Thai, and three wild soybeans. For SSR analysis, a total of 20 markers were surveyed to detect polymorphisms. For SNP analysis, four primers were designed from consensus sequence regions on disease resistance protein homolog genes, and used to amplify the genomic region. The PCR products were sequenced. A number of polymorphic SSR and SNP bands were scored on all genotypes and their genetic similarity was measured. Clustering analysis was performed independently on both types of markers. Clustering based on SSR markers separated the genotypes into three main groups originated from Korea, Thailand, and wild soybeans. On the other hand, two main groups were classified using SNP analysis. It seemed that SSR was more informative than SNP in this study. This may be due to the fact that SNP was surveyed on the smaller genomic region than SSR. Grouping based on the combined data of both markers revealed similar results to that of SNP rather than that of SSR. This might be due to the fact that more loci from SNP were considered to measure genetic relatedness than those from the SSR.

  • PDF

An Object Recognition Method Based on Depth Information for an Indoor Mobile Robot (실내 이동로봇을 위한 거리 정보 기반 물체 인식 방법)

  • Park, Jungkil;Park, Jaebyung
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.10
    • /
    • pp.958-964
    • /
    • 2015
  • In this paper, an object recognition method based on the depth information from the RGB-D camera, Xtion, is proposed for an indoor mobile robot. First, the RANdom SAmple Consensus (RANSAC) algorithm is applied to the point cloud obtained from the RGB-D camera to detect and remove the floor points. Next, the removed point cloud is classified by the k-means clustering method as each object's point cloud, and the normal vector of each point is obtained by using the k-d tree search. The obtained normal vectors are classified by the trained multi-layer perceptron as 18 classes and used as features for object recognition. To distinguish an object from another object, the similarity between them is measured by using Levenshtein distance. To verify the effectiveness and feasibility of the proposed object recognition method, the experiments are carried out with several similar boxes.