• 제목/요약/키워드: Microarray Data Analysis

검색결과 323건 처리시간 0.033초

PathTalk: Interpretation of Microarray Gene-Expression Clusters in Association with Biological Pathways

  • Chung, Tae-Su;Chung, Hee-Joon;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • 제5권3호
    • /
    • pp.124-128
    • /
    • 2007
  • Microarray technology enables us to measure the expression of tens of thousands of genes simultaneously under various experimental conditions. Clustering analysis is one of the most successful methods for analyzing microarray data using the assumption that co-expressed genes may be co-regulated. It is important to extract meaningful clusters from a long unordered list of clusters and to evaluate the functional homogeneity and heterogeneity of clusters. Many quality measures for clustering results have been suggested in different conditions. In the present study, we consider biological pathways as a collection of biological knowledge and used them as a reference for measuring the quality of clustering results and functional homogeneities. PathTalk visualizes and evaluates functional relationships between gene clusters and biological pathways.

cDNA 마이크로어레이에서 유전자간 상관 관계에 대한 보고 (A Report on the Inter-Gene Correlations in cDNA Microarray Data Sets)

  • 김병수;장지선;김상철;임요한
    • 응용통계연구
    • /
    • 제22권3호
    • /
    • pp.617-626
    • /
    • 2009
  • 최근에 보고되는 일련의 연구는 Affymetrix 마이크로어레이 자료에서 유전자간 상관관계가 강하고 장범위(長範圍)(long-ranged)로 나타나고 있으며, 기존의 "편한" 가정, 즉 유전자간 상관관계가 매우 약하며, 따라서 유전자간 유사 독립성을 가정할 수 있다는 주장이 비현실적이라는 것을 보고하고 있다. Qui 등 (2005b)은 각 유전자의 검정통계량을 병합하여 통계적 추론을 하는 이른바 비모수적 경험적 베이즈 방법을 적용하면 검색된 특이발현 유전자수의 분산이 커진다는 것을 보고하고 있고, 이러한 분산의 불안전성 이유로서 유전자간 강한 상관관계를 지적하고 있다. 또한 Klebanov와 Yakovlev (2007)는 유전자간 상관관계가 통계적 분석을 어렵게 하는 요인이라기 보다는 유용한 정보의 원천이고 적정한 변환을 통하여 근사 독립을 유지할 수 있는 급수를 만들 수 있으며 이 급수를 ${\delta}$-급수라고 불렀다. 본 보고에서는 국내에서 생산된 2조의 cDNA 마이크로어레이 자료에서 유전자간 상관관계가 비교적 강하며, 장범위(長範圍)로 나타나는 것을 확인하며, 유사 독립성을 전제할 수 있는 ${\delta}$-급수가 cDNA 마이크로어레이에서도 발견되는 것을 보고하고자 한다, 동 보고는 추후 cDNA 마이크로어레이 자료의 분석에서도 유전자간 상관관계를 고려하여야 함을 강조하고 있다.

선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류 (Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene)

  • 박수영;정채영
    • 한국정보통신학회논문지
    • /
    • 제15권3호
    • /
    • pp.747-752
    • /
    • 2011
  • 마이크로어레이 분류는 전형적으로 분류기 디자인과 에러 추정이 현저하게 작은 샘플에 기반한다는 것과 교차 검증 에러 추정이 대다수의 논문에 사용된다는 주목할 만한 두 가지 특징을 소유한다. 마이크로어레이 난소 암 데이터는 수 만개의 유전자 발현으로 구성되어 있고, 이러한 정보를 동시에 분석하기 위한 어떤 체계적인 절차도 없다. 본 논문에서는, 통계에 따라 유전자의 우선순위를 정함으로써 표지유전자를 선택하였고, 널리 보급되어 있는 분류 규칙인 선형 분류 분석, 3-nearest-neighbor와 결정 트리 알고리즘은 표지 유전자를 선택한 데이터와 선택하지 않는 데이터의 분류 정확도 비교를 위해 사용되어졌다. ANOVA를 이용하여 선택된 표지 유전자를 포함하는 마이크로어레이 데이터 셋에 선영 분류분석 규칙을 적용한 결과 97.78%의 가장 높은 분류 정확도와 가장 낮은 예측 에러 추정치를 나타내었다.

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • 응용통계연구
    • /
    • 제24권6호
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

Bayesian Variable Selection in the Proportional Hazard Model with Application to Microarray Data

  • Lee, Kyeong-Eun;Mallick, Bani K.
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 춘계 학술발표회 논문집
    • /
    • pp.17-23
    • /
    • 2005
  • In this paper we consider the well-known semiparametric proportional hazards models for survival analysis. These models are usually used with few covariates and many observations (subjects). But, for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions(covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n ${\ll}$p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.

  • PDF

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • 제8권1호
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Reliability of microarray analysis for studying periodontitis: low consistency in 2 periodontitis cohort data sets from different platforms and an integrative meta-analysis

  • Jeon, Yoon-Seon;Shivakumar, Manu;Kim, Dokyoon;Kim, Chang-Sung;Lee, Jung-Seok
    • Journal of Periodontal and Implant Science
    • /
    • 제51권1호
    • /
    • pp.18-29
    • /
    • 2021
  • Purpose: The aim of this study was to compare the characteristic expression patterns of advanced periodontitis in 2 cohort data sets analyzed using different microarray platforms, and to identify differentially expressed genes (DEGs) through a meta-analysis of both data sets. Methods: Twenty-two patients for cohort 1 and 40 patients for cohort 2 were recruited with the same inclusion criteria. The 2 cohort groups were analyzed using different platforms: Illumina and Agilent. A meta-analysis was performed to increase reliability by removing statistical differences between platforms. An integrative meta-analysis based on an empirical Bayesian methodology (ComBat) was conducted. DEGs for the integrated data sets were identified using the limma package to adjust for age, sex, and platform and compared with the results for cohorts 1 and 2. Clustering and pathway analyses were also performed. Results: This study detected 557 and 246 DEGs in cohorts 1 and 2, respectively, with 146 and 42 significantly enriched gene ontology (GO) terms. Overlapping between cohorts 1 and 2 was present in 59 DEGs and 18 GO terms. However, only 6 genes from the top 30 enriched DEGs overlapped, and there were no overlapping GO terms in the top 30 enriched pathways. The integrative meta-analysis detected 34 DEGs, of which 10 overlapped in all the integrated data sets of cohorts 1 and 2. Conclusions: The characteristic expression pattern differed between periodontitis and the healthy periodontium, but the consistency between the data sets from different cohorts and metadata was too low to suggest specific biomarkers for identifying periodontitis.