• 제목/요약/키워드: microarray expression data

검색결과 360건 처리시간 0.027초

Statistical Methods for Gene Expression Data

  • Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • 제11권1호
    • /
    • pp.59-77
    • /
    • 2004
  • Since the introduction of DNA microarray, a revolutionary high through-put biological technology, a lot of papers have been published to deal with the analyses of the gene expression data from the microarray. In this paper we review most papers relevant to the cDNA microarray data, classify them in statistical methods' point of view, and present some statistical methods deserving consideration and future study.

효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법 (A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction)

  • 황규백;장정호;장병탁
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제29권11호
    • /
    • pp.775-784
    • /
    • 2002
  • DNA chip 기술에 의해 얻어지는 마이크로어레이(microarray) 데이타는 세포나 조직 내의 수천 개 유전자의 발현도(expression level)를 한번에 측정한 것으로, 유전자 발현 양상에 기반한 암의 진단, 유전자의 기능 예측 등에 이용되고 있다. 다양한 데이타 분석 기법들 중 베이지안망(Bayesian network)은 데이타의 각 속성들간의 관계를 그래프 형태로 표현할 수 있는 특징을 가지고 있다. 이는 마이크로어레이 데이타의 분석을 통해 여러 유전자와 조직의 특성(암의 종류 등) 사이의 관계를 밝히는데 유용하다 하지만 대부분의 마이크로어레이 데이타는 sparse data로 베이지안망을 비롯한 각종 분석 기법의 적용을 어렵게 하고 있다. 본 논문에서는 베이지안망에 기반한 마이크로어레이 데이타 분석을 위해 효율적 구조 학습 알고리즘과 데이타 차원 축소를 이용한다. 제시되는 분석법은 실제 마이크로어레이 데이타인 NC160 data set에 적용되었으며, 그 유용성은 데이타로부터 학습된 베이지안망이 실제 생물학적으로 알려진 사실들을 어느 정도 정확하게 표현하는지에 의해 평가되었다.

마이크로어레이 데이터 공유 시스템 (Microarray Data Sharing System)

  • 윤지희;홍동완;이종근
    • 한국콘텐츠학회논문지
    • /
    • 제9권8호
    • /
    • pp.18-31
    • /
    • 2009
  • 최근, 마이크로어레이 실험 데이터의 품질과 재생산성에 대한 신뢰도가 증가하고 있어 마이크로어레이 데이터의 공유 및 활용에 대한 요구가 급속히 증가하고 있다. 그러나 공개되어 있는 국내, 외 마이크로어레이 데이터는 실험 방식, 플랫폼 등에 따라 서로 다른 데이터 항목과 포맷을 가지므로 데이터의 실제적 접근 및 활용이 어려운 상황이다. 본 논문에서는 실험 플랫폼, 데이터 포맷, 정규화 기법, 분석 방식 등이 서로 다른 기존의 마이크로어레이 데이터를 효율적으로 검색, 공유, 통합할 수 있는 마이크로어레이 데이터 공유 시스템을 제안한다. 제안된 시스템은 웹 서비스 기반 기술을 이용하여 분산된 마이크로어레이 데이터를 통합하며, 각 사이트의 사용자는 UDDI를 통하여 검색한 데이터를 표준 MGED 기반의 공통 데이터 구조로 자동 변환하여 다운 받을 수 있다. 정의된 공통 데이터 구조는 IDF,ADF,SDRF,EDF로 구성되어 다양한 구조의 마이크로어레이를 통합할 수 있는 템플릿 역할을 수행하며, MAGE-ML, MAGE-TAB, XML Schema 문서로 저장할 수 있다. 또한 제안된 시스템의 자동 데이터 제출기, 파일 관리자 등은 마이크로어레이 데이터 공유를 위한 다양한 부가 기능을 제공한다.

Quality Control Usage in High-Density Microarrays Reveals Differential Gene Expression Profiles in Ovarian Cancer

  • Villegas-Ruiz, Vanessa;Moreno, Jose;Jacome-Lopez, Karina;Zentella-Dehesa, Alejandro;Juarez-Mendez, Sergio
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제17권5호
    • /
    • pp.2519-2525
    • /
    • 2016
  • There are several existing reports of microarray chip use for assessment of altered gene expression in different diseases. In fact, there have been over 1.5 million assays of this kind performed over the last twenty years, which have influenced clinical and translational research studies. The most commonly used DNA microarray platforms are Affymetrix GeneChip and Quality Control Software along with their GeneChip Probe Arrays. These chips are created using several quality controls to confirm the success of each assay, but their actual impact on gene expression profiles had not been previously analyzed until the appearance of several bioinformatics tools for this purpose. We here performed a data mining analysis, in this case specifically focused on ovarian cancer, as well as healthy ovarian tissue and ovarian cell lines, in order to confirm quality control results and associated variation in gene expression profiles. The microarray data used in our research were downloaded from ArrayExpress and Gene Expression Omnibus (GEO) and analyzed with Expression Console Software using RMA, MAS5 and Plier algorithms. The gene expression profiles were obtained using Partek Genomics Suite v6.6 and data were visualized using principal component analysis, heat map, and Venn diagrams. Microarray quality control analysis showed that roughly 40% of the microarray files were false negative, demonstrating over- and under-estimation of expressed genes. Additionally, we confirmed the results performing second analysis using independent samples. About 70% of the significant expressed genes were correlated in both analyses. These results demonstrate the importance of appropriate microarray processing to obtain a reliable gene expression profile.

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • 제9권1호
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

Biological Pathway Extension Using Microarray Gene Expression Data

  • Chung, Tae-Su;Kim, Ji-Hun;Kim, Kee-Won;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • 제6권4호
    • /
    • pp.202-209
    • /
    • 2008
  • Biological pathways are known as collections of knowledge of certain biological processes. Although knowledge about a pathway is quite significant to further analysis, it covers only tiny portion of genes that exists. In this paper, we suggest a model to extend each individual pathway using a microarray expression data based on the known knowledge about the pathway. We take the Rosetta compendium dataset to extend pathways of Saccharomyces cerevisiae obtained from KEGG (Kyoto Encyclopedia of genes and genomes) database. Before applying our model, we verify the underlying assumption that microarray data reflect the interactive knowledge from pathway, and we evaluate our scoring system by introducing performance function. In the last step, we validate proposed candidates with the help of another type of biological information. We introduced a pathway extending model using its intrinsic structure and microarray expression data. The model provides the suitable candidate genes for each single biological pathway to extend it.

Poor Correlation Between the New Statistical and the Old Empirical Algorithms for DNA Microarray Analysis

  • Kim, Ju Han;Kuo, Winston P.;Kong, Sek-Won;Ohno-Machado, Lucila;Kohane, Isaac S.
    • Genomics & Informatics
    • /
    • 제1권2호
    • /
    • pp.87-93
    • /
    • 2003
  • DNA microarray is currently the most prominent tool for investigating large-scale gene expression data. Different algorithms for measuring gene expression levels from scanned images of microarray experiments may significantly impact the following steps of functional genomic analyses. $Affymetrix^{(R)}$ recently introduced high-density microarrays and new statistical algorithms in Microarray Suit (MAS) version 5.0$^{(R)}$. Very high correlations (0.92 - 0.97) between the new algorithms and the old algorithms (MAS 4.0) across several species and conditions were reported. We found that the column-wise array correlations had a tendency to be much higher than the row-wise gene correlations, which may be much more meaningful in the following higher-order data analyses including clustering and pattern analyses. In this paper, not only the detailed comparison of the two sets of algorithms is illustrated, but the impact of the introducing new algorithms on the further clustering analysis of microarray data and of possible pitfalls in mixing the old and the new algorithms were also described.

Gene Expression study of human chromosomal aneuploid

  • 이수만
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2006년도 Principles and Practice of Microarray for Biomedical Researchers
    • /
    • pp.98-107
    • /
    • 2006
  • Chromosomal copy number changes (aneuploidies) are common in human populations. The extra chromosome can affect gene expression by whole-genome level. By gene expression microarray analysis, we want to find aberrant gene expression due to aneuploidies in Klinefelter (+X) and Down syndrome (+21). We have analyzed the inactivation status of X-linked genes in Klinefelter Syndrome (KS) by using X-linked cDNA microarray and cSNP analysis. We analyzed the expression of 190 X-linked genes by cDNA microarray from the lymphocytes of five KS patients and five females (XX) with normal males (XY) controls. cDNA microarray experiments and cSNP analysis showed the differentially expressed genes were similar between KS and XX cases. To analyze the differential gene expressions in Down Syndrome (DS), Amniotic Fluid (AF)cells were collected from 12 pregnancies at $16{\sim}18$ weeks of gestation in DS (n=6) and normal (n=6) subjects. We also analysis AF cells for a DNA microarray system and compared the chip data with two dimensional protein gel analysis of amniotic fluid. Our data may provide the basis for a more systematic identification of biological markers of fetal DS, thus leading to an improved understanding of pathogenesis for fetal DS.

  • PDF

Network-based Microarray Data Analysis Tool

  • Park, Hee-Chang;Ryu, Ki-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권1호
    • /
    • pp.53-62
    • /
    • 2006
  • DNA microarray data analysis is a new technology to investigate the expression levels of thousands of genes simultaneously. Since DNA microarray data structures are various and complicative, the data are generally stored in databases for approaching to and controlling the data effectively. But we have some difficulties to analyze and control the data when the data are stored in the several database management systems or that the data are stored to the file format. The existing analysis tools for DNA microarray data have many difficult problems by complicated instructions, and dependency on data types and operating system. In this paper, we design and implement network-based analysis tool for obtaining to useful information from DNA microarray data. When we use this tool, we can analyze effectively DNA microarray data without special knowledge and education for data types and analytical methods.

  • PDF

효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화 (Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression)

  • 김재희;김태훈
    • 응용통계연구
    • /
    • 제26권3호
    • /
    • pp.389-399
    • /
    • 2013
  • 본 연구에서는 가우시안 과정회귀방법을 소개하고 시계열 마이크로어레이 유전자 발현데이터에 대해 가우시안 과정회귀를 적용한 사례를 보이고자한다. 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용한 유전자를 선별방법에 대한 모의실험을 통해 민감도, 특이도, 위발견율 등을 계산하여 선별방법으로의 활용성을 보였다. 실제 효모세포주기 데이터에 대해 제곱지수공분산함수를 고려한 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용하여 차변화된 유전자를 선별한 후, 선별된 유전자들에 대해 가우시안 모형기반 군집화를 하고 실루엣 값으로 군집유효성을 보였다.