• 제목/요약/키워드: Microarray data analysis

검색결과 322건 처리시간 0.023초

마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교 (Comparison of clustering methods of microarray gene expression data)

  • 임진수;임동훈
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권1호
    • /
    • pp.39-51
    • /
    • 2012
  • 군집분석은 마이크로어레이 발현자료에서 유전자 혹은 표본들의 유사한 특성을 갖는 연관구조를 조사하는데 중요한 도구이다. 본 논문에서는 마이크로어레이 자료에서 계층적 군집방법, K-평균법, PAM (partitioning around medoids), SOM (self-organizing maps) 그리고 모형기반 군집방법 들의 성능을 3가지 군집 타당성 측도인 내적 측도, 안정적 측도 그리고 생물학적 측도를 가지고 비교분석하고자 한다. 모의실험을 통해 생성된 자료와 실제 SRBCT (small round blue cell tumor) 자료를 가지고 여러 가지 군집방법들의 성능을 비교하였으며 그 결과 모의실험 자료에서는 거의 모든 방법들이 3가지 군집측도에서 원래 자료와 일치하는 좋은 군집 결과를 나타내었고 SRBCT 자료에서는 모의실험 자료처럼 명확한 군집화 결과를 보여주지는 않으나 내적측도의 실루엣 너비 (Silhouette width) 관점에서는 PAM 방법, SOM, 모형기반 군집방법 그리고 생물학적 측도에서는 PAM 방법과 모형기반 군집방법이 모의실험 결과와 비슷한 결과를 얻었고 안정적 측도에서 모형기반 군집방법이 다른 방법들보다 좋은 군집결과를 보여주었다.

Bayesian Variable Selection in the Proportional Hazard Model

  • Lee, Kyeong-Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.605-616
    • /
    • 2004
  • In this paper we consider the proportional hazard models for survival analysis in the microarray data. For a given vector of response values and gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method.

  • PDF

An Iterative Normalization Algorithm for cDNA Microarray Medical Data Analysis

  • Kim, Yoonhee;Park, Woong-Yang;Kim, Ho
    • Genomics & Informatics
    • /
    • 제2권2호
    • /
    • pp.92-98
    • /
    • 2004
  • A cDNA microarray experiment is one of the most useful high-throughput experiments in medical informatics for monitoring gene expression levels. Statistical analysis with a cDNA microarray medical data requires a normalization procedure to reduce the systematic errors that are impossible to control by the experimental conditions. Despite the variety of normalization methods, this. paper suggests a more general and synthetic normalization algorithm with a control gene set based on previous studies of normalization. Iterative normalization method was used to select and include a new control gene set among the whole genes iteratively at every step of the normalization calculation initiated with the housekeeping genes. The objective of this iterative normalization was to maintain the pattern of the original data and to keep the gene expression levels stable. Spatial plots, M&A (ratio and average values of the intensity) plots and box plots showed a convergence to zero of the mean across all genes graphically after applying our iterative normalization. The practicability of the algorithm was demonstrated by applying our method to the data for the human photo aging study.

DNA 마이크로어레이 데이타의 클러스터링 알고리즘 및 도구 개발 (Development of Clustering Algorithm and Tool for DNA Microarray Data)

  • 여상수;김성권
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제30권10호
    • /
    • pp.544-555
    • /
    • 2003
  • DNA 마이크로어레이 실험으로 나오는 데이타는 아주 많은 양의 유전자 발현 정보를 담고 있기 때문에 적절한 분석 방법이 필요하다. 대표적인 분석 방법은 계층적 클러스터링(hierarchical clustering) 방법이다. 본 논문에서는 계층적 클러스터링의 결과로 나오게 되는 덴드로그램(dendrogram)에 대해서 후처리(post-Processing)를 시행함으로써 DNA 마이크로어레이 데이타 분석을 더 용이하게 해주는 리프오더링(leaf-ordering)에 대해서 연구하였다. 먼저, 기존의 리프오더링 알고리즘들을 분석하였고, 리프오더링 알고리즘의 새로운 접근 방식을 제안하였다. 또한 이에 대한 성능을 실험하고 분석하기 위해서 계층적 클러스터링과 몇 가지 리프오더링 알고리즘들, 그리고 제안된 접근 방식을 직접 구현한 HCLO (Hierarchical Clustering & Leaf-Ordering Tool)에 대해서 소개하였다.

장환형 단일가닥 DNA를 이용한 암세포 성장 억제 유전자 발굴 (Large-Circular Single-stranded Sense and Antisense DNA for Identification of Cancer-Related Genes)

  • 배윤위;문익재;서영배;도경오
    • 한국미생물·생명공학회지
    • /
    • 제38권1호
    • /
    • pp.70-76
    • /
    • 2010
  • The single-stranded large circular (LC)-sense DNA were utilized as probes for DNA chip experiments. The microarray experiment using LC-sense DNA probes found differentially expressed genes in A549 cells as compared to WI38VA13 cells, and microarray data were well-correlated with data acquired from quantitative real-time RT-PCR. A 5K LC-sense DNA microarray was prepared, and the repeated experiments and dye swap test showed consistent expression patterns. Subsequent functional analysis using LC-antisense library of overexpressed genes identified several genes involved in A549 cell growth. These experiments demonstrated proper feature of LC-sense molecules as probe DNA for microarray and the potential utility of the combination of LC-sense microarray and antisense libraries for an effective functional validation of genes.

Development of a Reproducibility Index for cDNA Microarray Experiments

  • 김병수;라선영
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2002년도 춘계 학술발표회 논문집
    • /
    • pp.79-83
    • /
    • 2002
  • Since its introduction in 1995 by Schena et al. cDNA microarrays have been established as a potential tool for high-throughput analysis which allows the global monitoring of expression levels for thousands of genes simultaneously. One of the characteristics of the cDNA microarray data is that there is inherent noise even after the removal of systematic effects in the experiment. Therefore, replication is crucial to the microarray experiment. The assessment of reproducibility among replicates, however, has drawn little attention. Reproducibility may be assessed with several different endpoints along the process of data reduction of the microarray data. We define the reproducibility to be the degree with which replicate arrays duplicate each other. The aim of this note is to develop a novel measure of reproducibility among replicates in the cDNA microarray experiment based on the unprocessed data. Suppose we have p genes and n replicates in a microarray experiment. We first develop a measure of reproducibility between two replicates and generalize this concept for a measure of reproducibility of one replicate against the remaining n-1 replicates. We used the rank of the outcome variable and employed the concept of a measure of tracking in the blood pressure literature. We applied the reproducibility measure to two sets of microarray experiments in which one experiment was performed in a more homogeneous environment, resulting in validation of this novel method. The operational interpretation of this measure is clearer than Pearson's correlation coefficient which might be used as a crude measure of reproducibility of two replicates.

  • PDF

2 단계 접근법을 통한 통합 마이크로어레이 데이타의 분류기 생성 (Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach)

  • 윤영미;이종찬;박상현
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제34권1호
    • /
    • pp.46-58
    • /
    • 2007
  • 마이크로어레이 데이타는 동시에 수 만개 유전자의 발현 값을 포함하고 있기 때문에 질병의 발현 형질 분류에 매우 유용하게 쓰인다. 그러나 동일한 생물학적 주제라 할지라도 여러 독립된 연구 집단에서 생성된 마이크로어레이의 분석결과는 서로 다르게 나타날 수 있다. 이에 대한 주된 이유는 하나의 마이크로어레이 실험에 참여한 샘플의 수가 제한적이기 때문이다. 따라서 개별적으로 수행된 마이크로어레이 데이타를 통합하여 샘플의 수를 늘리는 것은, 보다 정확한 분석을 하는데 있어 매우 중요하다. 본 연구에서는 이에 대한 해결 방안으로 두 단계 접근방법을 제안한다. 제 1 단계에서는 개별적으로 생성된 동일주제의 마이크로어레이 데이타를 통합한 후 인포머티브(Informative) 유전자를 추출하고 제 2 단계에서는 인포머티브 유전자만을 이용하여 클래스 분류(Classification) 과정 후 분류자를 추출한다. 이 분류자를 다른 테스트 샘플 데이타에 적용한 실험결과를 보면 마이크로어레이 데이타를 통합하여 샘플의 수를 증가시킬수록, 비교 방법에 비해 정확도가 최대 24.19% 높은 분류자를 만들어 내는 것을 알 수 있다.

Biological Pathway Extension Using Microarray Gene Expression Data

  • Chung, Tae-Su;Kim, Ji-Hun;Kim, Kee-Won;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • 제6권4호
    • /
    • pp.202-209
    • /
    • 2008
  • Biological pathways are known as collections of knowledge of certain biological processes. Although knowledge about a pathway is quite significant to further analysis, it covers only tiny portion of genes that exists. In this paper, we suggest a model to extend each individual pathway using a microarray expression data based on the known knowledge about the pathway. We take the Rosetta compendium dataset to extend pathways of Saccharomyces cerevisiae obtained from KEGG (Kyoto Encyclopedia of genes and genomes) database. Before applying our model, we verify the underlying assumption that microarray data reflect the interactive knowledge from pathway, and we evaluate our scoring system by introducing performance function. In the last step, we validate proposed candidates with the help of another type of biological information. We introduced a pathway extending model using its intrinsic structure and microarray expression data. The model provides the suitable candidate genes for each single biological pathway to extend it.

Analysis of Key Genes and Pathways Associated with Colorectal Cancer with Microarray Technology

  • Liu, Yan-Jun;Zhang, Shu;Hou, Kang;Li, Yun-Tao;Liu, Zhan;Ren, Hai-Liang;Luo, Dan;Li, Shi-Hong
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제14권3호
    • /
    • pp.1819-1823
    • /
    • 2013
  • Objective: Microarray data were analyzed to explore key genes and their functions in progression of colorectal cancer (CRC). Methods: Two microarray data sets were downloaded from Gene Expression Omnibus (GEO) database and differentially expressed genes (DEGs) were identified using corresponding packages of R. Functional enrichment analysis was performed with DAVID tools to uncover their biological functions. Results: 631 and 590 DEGs were obtained from the two data sets, respectively. A total of 32 common DEGs were then screened out with the rank product method. The significantly enriched GO terms included inflammatory response, response to wounding and response to drugs. Two interleukin-related domains were revealed in the domain analysis. KEGG pathway enrichment analysis showed that the PPAR signaling pathway and the renin-angiotensin system were enriched in the DEGs. Conclusions: Our study to systemically characterize gene expression changes in CRC with microarray technology revealed changes in a range of key genes, pathways and function modules. Their utility in diagnosis and treatment now require exploration.

Statistical Analysis of Gene Expression Data

  • 박태성
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2001년도 제2회 생물정보 워크샵 (DNA Chip Bioinformatics)
    • /
    • pp.97-115
    • /
    • 2001
  • cDNA microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. Many statistical analysis tools become widely applicable to the analysis of cDNA microarray data. In this talk, we consider a two-way ANOVA model to differentiate genes that have high variability and ones that do not. Using this model, we detect genes that have different gene expression profiles among experimental groups. The two-way ANOVA model is illustrated using cDNA microarrays of 3,800 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells.

  • PDF