• Title/Summary/Keyword: Microarray Data Analysis

Search Result 323, Processing Time 0.033 seconds

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Comparison of Expression Profiling of Gastric Cancer by O1igonucleotide and cDNA Microarrays (O1igonucleotide Microarray와 cDNA Microarray를 이용한 위암조직의 대단위 유전자 발현 비교)

  • Jung, Kwang-Hwa;Kim, Jung-Kyu;Noh, Ji-Heon;Eun, Jung-Woo;Bae, Hyun-Jin;Lee, Sug-Hyung;Park, Won-Sang;Yoo, Nam-Jin;Lee, Jung-Young;Nam, Suk-Woo
    • YAKHAK HOEJI
    • /
    • v.51 no.3
    • /
    • pp.179-185
    • /
    • 2007
  • Gastric cancer is one of the most common malignancies in Korea, but the predominant molecular event underlying gastric carcinogenesis remain unknown. Recently, DNA microarray technology has enabled the comprehensive analysis of gene expression level, and as such has yielded great insight into the molecular nature of cancer, However, despite the powerful approach of this techniques, the technical artifacts and/or bias in applied array platform limited the liability of resultant tens of thousand data points from microarray experiments. Therefore, we applied two different any platforms, such as olignucleotide microarray and cDNA microarray, to identify gastric cancer related large-scale molecular signature of the same human specimens. When thirty sets of matched human gastric cancer and normal tissues subjected to oligonucleotide microarray, total 623 genes were resulted as differently expressed genes in gastric cancer compared to normal tissues, and 252 genes for cDNA microarray analysis. In addition, forty three outlier genes which reflect the characteristic expression signature of gastric cancer beyond array platform and analytical protocol was recapitulated from two different expression profile. In conclusion, we were able to identify robust large-scale molecular changes in gastric cancer by applying two different platform of DNA microarray, this may facilitate to understand molecular carcinogenesis of gastric cancer.

Performance of the Agilent Microarray Platform for One-color Analysis of Gene Expression

  • Song Sunny;Lucas Anne;D'Andrade Petula;Visitacion Marc;Tangvoranuntakul Pam;FulmerSmentek Stephanie
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2006.02a
    • /
    • pp.78-78
    • /
    • 2006
  • Gene expression analysis can be performed by one-color (intensity-based) or two-color (ratio-based) microarray platforms depending on the specific applications and needs of the researcher. The traditional two-color approach is well founded from a historical and scientific standpoint, and the one-color approach, when paired with high quality microarrays and a robust workflow, offers additional flexibility in experimental design. Two of the major requirements of any microarray platform are system reproducibility, which provides the means for high confidence experiments and accurate comparison across multiple samples; and high sensitivity, for the detection of significant gene expression changes, including small fold changes across multiple gene sets. Each of these requirements is fulfilled by the Agilent One-color Gene Expression Platform as illustrated by the data included in this study. As a result, researchers have the ability to take advantage of the enhanced performance and sensitivity of Agilent's 60-mer oligonucleotide microarrays, and experience the first commercial microarray platform compatible with both one- and two-color detection.

  • PDF

Global Optimization of Clusters in Gene Expression Data of DNA Microarrays by Deterministic Annealing

  • Lee, Kwon Moo;Chung, Tae Su;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.1 no.1
    • /
    • pp.20-24
    • /
    • 2003
  • The analysis of DNA microarry data is one of the most important things for functional genomics research. The matrix representation of microarray data and its successive 'optimal' incisional hyperplanes is a useful platform for developing optimization algorithms to determine the optimal partitioning of pairwise proximity matrix representing completely connected and weighted graph. We developed Deterministic Annealing (DA) approach to determine the successive optimal binary partitioning. DA algorithm demonstrated good performance with the ability to find the 'globally optimal' binary partitions. In addition, the objects that have not been clustered at small non­zero temperature, are considered to be very sensitive to even small randomness, and can be used to estimate the reliability of the clustering.

CLUSTERING DNA MICROARRAY DATA BY STOCHASTIC ALGORITHM

  • Shon, Ho-Sun;Kim, Sun-Shin;Wang, Ling;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.438-441
    • /
    • 2007
  • Recently, due to molecular biology and engineering technology, DNA microarray makes people watch thousands of genes and the state of variation from the tissue samples of living body. With DNA Microarray, it is possible to construct a genetic group that has similar expression patterns and grasp the progress and variation of gene. This paper practices Cluster Analysis which purposes the discovery of biological subgroup or class by using gene expression information. Hence, the purpose of this paper is to predict a new class which is unknown, open leukaemia data are used for the experiment, and MCL (Markov CLustering) algorithm is applied as an analysis method. The MCL algorithm is based on probability and graph flow theory. MCL simulates random walks on a graph using Markov matrices to determine the transition probabilities among nodes of the graph. If you look at closely to the method, first, MCL algorithm should be applied after getting the distance by using Euclidean distance, then inflation and diagonal factors which are tuning modulus should be tuned, and finally the threshold using the average of each column should be gotten to distinguish one class from another class. Our method has improved the accuracy through using the threshold, namely the average of each column. Our experimental result shows about 70% of accuracy in average compared to the class that is known before. Also, for the comparison evaluation to other algorithm, the proposed method compared to and analyzed SOM (Self-Organizing Map) clustering algorithm which is divided into neural network and hierarchical clustering. The method shows the better result when compared to hierarchical clustering. In further study, it should be studied whether there will be a similar result when the parameter of inflation gotten from our experiment is applied to other gene expression data. We are also trying to make a systematic method to improve the accuracy by regulating the factors mentioned above.

  • PDF

A Method of Identifying Disease-related Significant Pathways Using Time-Series Microarray Data (시간열 마이크로어레이 데이터를 이용한 질병 관련 유의한 패스웨이 유전자 집합의 검출)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.5
    • /
    • pp.17-24
    • /
    • 2010
  • Recently the study of identifying bio-markers for disease diagnosis and prognosis has been actively performed. In particular, lots of attentions have been paid to the finding of pathway gene-sets differentially expressed in disease patients rather than the finding of individual gene markers. In this paper we propose a novel method to identify disease-related pathway gene-sets based on time-series microarray data. For this purpose, we firstly compute individual gene scores by the using maSigPro (microarray Significant Profiles) and then arrange all the genes in the decreasing order of the corresponding gene scores. The rank of each gene in the entire list is used to evaluate the statistical significance of candidate gene-sets with Wilcoxson rank sum test. For the generation of candidate gene-sets, MSigDB (Molecular Signatures Database) pathway information has been employed. The experiment was conducted with prostate cancer time-series microarray data and the results showed the usefulness of the proposed method by correctly identifying 6 out of 7 biological pathways already known as being actually related to prostate cancer.

Mining of Subspace Contrasting Sample Groups in Microarray Data (마이크로어레이 데이터의 부공간 대조 샘플집단 마이닝)

  • Lee, Kyung-Mi;Lee, Keon-Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.569-574
    • /
    • 2011
  • In this paper, we introduce the subspace contrasting group identification problem and propose an algorithm to solve the problem. In order to identify contrasting groups, the algorithm first determines two groups of which attribute values are in one of the contrasting ranges specified by the analyst, and searches for the contrasting groups while increasing the dimension of subspaces with an association rule mining strategy. Because the dimension of microarray data is likely to be tens of thousands, it is burdensome to find all contrasting groups over all possible subspaces by query generation. It is very useful in the sense that the proposed method allows to find those contrasting groups without analyst's involvement.