• 제목/요약/키워드: Consensus clustering

검색결과 17건 처리시간 0.027초

Consensus Clustering for Time Course Gene Expression Microarray Data

  • Kim, Seo-Young;Bae, Jong-Sung
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.335-348
    • /
    • 2005
  • The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Recently, the time course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. For the data, biologists are attempting to group genes based on the temporal pattern of their expression levels. We apply the consensus clustering algorithm to a time course gene expression data in order to infer statistically meaningful information from the measurements. We evaluate each of consensus clustering and existing clustering methods with various validation measures. In this paper, we consider hierarchical clustering and Diana of existing methods, and consensus clustering with hierarchical clustering, Diana and mixed hierachical and Diana methods and evaluate their performances on a real micro array data set and two simulated data sets.

군집분석 방법들을 비교하기 위한 상사그림 (The Similarity Plot for Comparing Clustering Methods)

  • 장대흥
    • 응용통계연구
    • /
    • 제26권2호
    • /
    • pp.361-373
    • /
    • 2013
  • 군집분석을 위한 알고리즘은 매우 많다. 이러한 군집분석 방법들이 개체들을 어떻게 여러 개의 군집으로 나누는 지를 서로 비교하기 위해서는 나누어지는 군집들이 얼마나 동일한가를 알 수 있는 동의 측도가 필요하다. 우리가 고려하여야 할 군집분석 방법들이 많아질수록 덩달아 동의 측도들 값도 많아지게 된다. 그래서 복수 개의 군집분석 방법들과 대응되는 동의 측도값들을 한 눈에 확인할 수 있는 도구가 필요하다. 본 논문을 통하여 군집분석 방법들과 대응되는 동의 측도값들을 한 눈에 확인할 수 있는 그래픽도구들을 제안하고자 한다.

Fast Outlier Removal for Image Registration based on Modified K-means Clustering

  • Soh, Young-Sung;Qadir, Mudasar;Kim, In-Taek
    • 융합신호처리학회논문지
    • /
    • 제16권1호
    • /
    • pp.9-14
    • /
    • 2015
  • Outlier detection and removal is a crucial step needed for various image processing applications such as image registration. Random Sample Consensus (RANSAC) is known to be the best algorithm so far for the outlier detection and removal. However RANSAC requires a cosiderable computation time. To drastically reduce the computation time while preserving the comparable quality, a outlier detection and removal method based on modified K-means is proposed. The original K-means was conducted first for matching point pairs and then cluster merging and member exclusion step are performed in the modification step. We applied the methods to various images with highly repetitive patterns under several geometric distortions and obtained successful results. We compared the proposed method with RANSAC and showed that the proposed method runs 3~10 times faster than RANSAC.

A new clustering algorithm based on the connected region generation

  • Feng, Liuwei;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권6호
    • /
    • pp.2619-2643
    • /
    • 2018
  • In this paper, a new clustering algorithm based on the connected region generation (CRG-clustering) is proposed. It is an effective and robust approach to clustering on the basis of the connectivity of the points and their neighbors. In the new algorithm, a connected region generating (CRG) algorithm is developed to obtain the connected regions and an isolated point set. Each connected region corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. Then, a region expansion strategy and a consensus criterion are used to deal with the points in the isolated point set. Experimental results on the synthetic datasets and the real world datasets show that the proposed algorithm has high performance and is insensitive to noise.

Subphenotypes of Acute Respiratory Distress Syndrome: Advancing towards Precision Medicine

  • Andrea R. Levine;Carolyn S. Calfee
    • Tuberculosis and Respiratory Diseases
    • /
    • 제87권1호
    • /
    • pp.1-11
    • /
    • 2024
  • Acute respiratory distress syndrome (ARDS) is a common cause of severe hypoxemia defined by the acute onset of bilateral non-cardiogenic pulmonary edema. The diagnosis is made by defined consensus criteria. Supportive care, including prevention of further injury to the lungs, is the only treatment that conclusively improves outcomes. The inability to find more advanced therapies is due, in part, to the highly sensitive but relatively non-specific current syndromic consensus criteria, combining a heterogenous population of patients under the umbrella of ARDS. With few effective therapies, the morality rate remains 30% to 40%. Many subphenotypes of ARDS have been proposed to cluster patients with shared combinations of observable or measurable traits. Subphenotyping patients is a strategy to overcome heterogeneity to advance clinical research and eventually identify treatable traits. Subphenotypes of ARDS have been proposed based on radiographic patterns, protein biomarkers, transcriptomics, and/or machine-based clustering of clinical and biological variables. Some of these strategies have been reproducible across patient cohorts, but at present all have practical limitations to their implementation. Furthermore, there is no agreement on which strategy is the most appropriate. This review will discuss the current strategies for subphenotyping patients with ARDS, including the strengths and limitations, and the future directions of ARDS subphenotyping.

대용량 순차 데이터베이스에서 근사 순차패턴 탐색 (Mining Approximate Sequential Patterns in a Large Sequence Database)

  • 금혜정;장중혁
    • 정보처리학회논문지D
    • /
    • 제13D권2호
    • /
    • pp.199-206
    • /
    • 2006
  • 순차패턴 탐색은 다양한 응용 분야에서 매우 중요한 데이터 마이닝 작업으로 간주된다. 그러나 기존의 순차패턴 탐색 방법들은 길이가 긴 순차패턴이나 노이즈 정보를 다수 포함한 데이터베이스에 대한 마이닝에서는 한계가 있다. 해당 방법들은 매우 짧고 사소한 패턴들은 탐색하지만 다수의 순차 정보들에서 공유되는 중요 패턴들을 분석하는데 어려움을 겪는다. 본 논문에서는 이러한 문제를 해결하기 위한 방법으로 대용량 데이터베이스에 대한 근사 순차패턴 탐색 방법을 제안한다. 근사 순차패턴은 다수의 순차 정보들에서 근사적으로 공유되는 순차패턴을 의미한다. 제안된 방법은 두 과정으로 구분된다. 하나는 유사도에 따라 분석 대상 순차 정보들을 몇 개의 군집으로 나누는 과정이며, 다른 하나는 다중 정렬 방식을 적용하여 각 군집으로부터 대표 패턴을 찾는 과정이다. 이를 위해서 다수의 순차 정보들을 하나로 표현할 수 있는 가중치 순차패턴을 제시하며, 다수의 순차 정보들은 가중치 순차패턴 형태로 통합된다. 이렇게 통합된 정보를 가진 각 가중치 순차패턴을 이용하여 여러 순차 정보와 근사한 하나의 대표 패턴을 생성한다. 끝으로, 다양한 실험을 통해서 제안된 방법의 유용성을 검증한다.

Identifying potential mergers of globular clusters: a machine-learning approach

  • Pasquato, Mario
    • 천문학회보
    • /
    • 제39권2호
    • /
    • pp.89-89
    • /
    • 2014
  • While the current consensus view holds that galaxy mergers are commonplace, it is sometimes speculated that Globular Clusters (GCs) may also have undergone merging events, possibly resulting in massive objects with a strong metallicity spread such as Omega Centauri. Galaxies are mostly far, unresolved systems whose mergers are most likely wet, resulting in observational as well as modeling difficulties, but GCs are resolved into stars that can be used as discrete dynamical tracers, and their mergers might have been dry, therefore easily simulated with an N-body code. It is however difficult to determine the observational parameters best suited to reveal a history of merging based on the positions and kinematics of GC stars, if evidence of merging is at all observable. To overcome this difficulty, we investigate the applicability of supervised and unsupervised machine learning to the automatic reconstruction of the dynamical history of a stellar system. In particular we test whether statistical clustering methods can classify simulated systems into monolithic versus merger products. We run direct N-body simulations of two identical King-model clusters undergoing a head-on collision resulting in a merged system, and other simulations of isolated King models with the same total number of particles as the merged system. After several relaxation times elapse, we extract a sample of snapshots of the sky-projected positions of particles from each simulation at different dynamical times, and we run a variety of clustering and classification algorithms to classify the snapshots into two subsets in a relevant feature space.

  • PDF

Assessment of Population Structure and Genetic Diversity of 15 Chinese Indigenous Chicken Breeds Using Microsatellite Markers

  • Chen, Guohong;Bao, Wenbin;Shu, Jingting;Ji, Congliang;Wang, Minqiang;Eding, Herwin;Muchadeyi, Farai;Weigend, Steffen
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제21권3호
    • /
    • pp.331-339
    • /
    • 2008
  • The genetic structure and diversity of 15 Chinese indigenous chicken breeds was investigated using 29 microsatellite markers. The total number of birds examined was 542, on average 36 birds per breed. A total of 277 alleles (mean number 9.55 alleles per locus, ranging from 2 to 25) was observed. All populations showed high levels of heterozygosity with the lowest estimate of 0.440 for the Gushi chickens, and the highest one of 0.644 observed for Wannan Three-yellow chickens. The global heterozygote deficit across all populations (FIT) amounted to 0.180 (p<0.001). About 16% of the total genetic variability originated from differences between breeds, with all loci contributing significantly to this differentiation. An unrooted consensus tree was constructed using the Neighbour-Joining method and pair-wise distances based on marker estimated kinships. Two main groups were found. The heavy-body type populations grouped together in one cluster while the light-body type populations formed the second cluster. The STRUCTURE software was used to assess genetic clustering of these chicken breeds. Similar to the phylogenetic analysis, the heavy-body type and light-body type populations separated first. Clustering analysis provided an accurate representation of the current genetic relations among the breeds. Remarkably similar breed rankings were obtained with all methods.

Evaluation of Genetic Diversity among Soybean Genotypes Using SSR and SNP

  • Lee, Suk-Ha;P. Tanya;O, Srinives;T. Toojinda;A. Vanavichit;Ha, Bo-Keun;Bae, Jeong-Suk;Moon, Jung-Kyung
    • 한국작물학회지
    • /
    • 제46권4호
    • /
    • pp.334-340
    • /
    • 2001
  • Two different types of molecular markers, simple sequence repeat (SSR) and single nucleotide polymorphism (SNP), were used to measure genetic diversity among five Korean, eight Thai, and three wild soybeans. For SSR analysis, a total of 20 markers were surveyed to detect polymorphisms. For SNP analysis, four primers were designed from consensus sequence regions on disease resistance protein homolog genes, and used to amplify the genomic region. The PCR products were sequenced. A number of polymorphic SSR and SNP bands were scored on all genotypes and their genetic similarity was measured. Clustering analysis was performed independently on both types of markers. Clustering based on SSR markers separated the genotypes into three main groups originated from Korea, Thailand, and wild soybeans. On the other hand, two main groups were classified using SNP analysis. It seemed that SSR was more informative than SNP in this study. This may be due to the fact that SNP was surveyed on the smaller genomic region than SSR. Grouping based on the combined data of both markers revealed similar results to that of SNP rather than that of SSR. This might be due to the fact that more loci from SNP were considered to measure genetic relatedness than those from the SSR.

  • PDF

실내 이동로봇을 위한 거리 정보 기반 물체 인식 방법 (An Object Recognition Method Based on Depth Information for an Indoor Mobile Robot)

  • 박정길;박재병
    • 제어로봇시스템학회논문지
    • /
    • 제21권10호
    • /
    • pp.958-964
    • /
    • 2015
  • In this paper, an object recognition method based on the depth information from the RGB-D camera, Xtion, is proposed for an indoor mobile robot. First, the RANdom SAmple Consensus (RANSAC) algorithm is applied to the point cloud obtained from the RGB-D camera to detect and remove the floor points. Next, the removed point cloud is classified by the k-means clustering method as each object's point cloud, and the normal vector of each point is obtained by using the k-d tree search. The obtained normal vectors are classified by the trained multi-layer perceptron as 18 classes and used as features for object recognition. To distinguish an object from another object, the similarity between them is measured by using Levenshtein distance. To verify the effectiveness and feasibility of the proposed object recognition method, the experiments are carried out with several similar boxes.