• Title/Summary/Keyword: Microarray Data Classification

Search Result 67, Processing Time 0.024 seconds

Gene Selection and Classification by Partial Least Squares and Principal component analysis (부분최소자승법과 주성분분석을 이용한 유전자 선택과 분류)

  • Park, Hoseok;Kim, Hey-Jin;Park, Seugj in;Bang, Sung-Yang
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.598-600
    • /
    • 2001
  • DNA chip technology enables us to monitor thousands of gene expressions per sample simultaneously. Typically, DNA microarray data has at least several thousands of variables (genes) wish relatively smal1 number of samples. Thus feature (gene) selection by dimensionality reduction is necessary for efficient data analysis. In this paper we employ the partial least squares (PLS) method for gene selection and the principal component analysis (PCA) method for classification. The useful behavior of the PLS is verified by computer simulations.

  • PDF

Gene Discovery Analysis from Mouse Embryonic Stem Cells Based on Time Course Microarray Data

  • Suh, Young Ju;Cho, Sun A;Shim, Jung Hee;Yook, Yeon Joo;Yoo, Kyung Hyun;Kim, Jung Hee;Park, Eun Young;Noh, Ji Yeun;Lee, Seong Ho;Yang, Moon Hee;Jeong, Hyo Seok;Park, Jong Hoon
    • Molecules and Cells
    • /
    • v.26 no.4
    • /
    • pp.338-343
    • /
    • 2008
  • An embryonic stem cell is a powerful tool for investigation of early development in vitro. The study of embryonic stem cell mediated neuronal differentiation allows for improved understanding of the mechanisms involved in embryonic neuronal development. We investigated expression profile changes using time course cDNA microarray to identify clues for the signaling network of neuronal differentiation. For the short time course microarray data, pattern analysis based on the quadratic regression method is an effective approach for identification and classification of a variety of expressed genes that have biological relevance. We studied the expression patterns, at each of 5 stages, after neuronal induction at the mRNA level of embryonic stem cells using the quadratic regression method for pattern analysis. As a result, a total of 316 genes (3.1%) including 166 (1.7%) informative genes in 8 possible expression patterns were identified by pattern analysis. Among the selected genes associated with neurological system, all three genes showing linearly increasing pattern over time, and one gene showing decreasing pattern over time, were verified by RT-PCR. Therefore, an increase in gene expression over time, in a linear pattern, may be associated with embryonic development. The genes: Tcfap2c, Ttr, Wnt3a, Btg2 and Foxk1 detected by pattern analysis, and verified by RT-PCR simultaneously, may be candidate markers associated with the development of the nervous system. Our study shows that pattern analysis, using the quadratic regression method, is very useful for investigation of time course cDNA microarray data. The pattern analysis used in this study has biological significance for the study of embryonic stem cells.

The System Of Microarray Data Classification Using Significant Gene Combination Method based on Neural Network. (신경망 기반의 유전자조합을 이용한 마이크로어레이 데이터 분류 시스템)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.7
    • /
    • pp.1243-1248
    • /
    • 2008
  • As development in technology of bioinformatics recently mates it possible to operate micro-level experiments, we can observe the expression pattern of total genome through on chip and analyze the interactions of thousands of genes at the same time. In this thesis, we used CDNA microarrays of 3840 genes obtained from neuronal differentiation experiment of cortical stem cells on white mouse with cancer. It analyzed and compared performance of each of the experiment result using existing DT, NB, SVM and multi-perceptron neural network classifier combined the similar scale combination method after constructing class classification model by extracting significant gene list with a similar scale combination method proposed in this paper through normalization. Result classifying in Multi-Perceptron neural network classifier for selected 200 genes using combination of PC(Pearson correlation coefficient) and ED(Euclidean distance coefficient) represented the accuracy of 98.84%, which show that it improve classification performance than case to experiment using other classifier.

The Implement of System on Microarry Classification Using Combination of Signigicant Gene Selection Method (정보력 있는 유전자 선택 방법 조합을 이용한 마이크로어레이 분류 시스템 구현)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.2
    • /
    • pp.315-320
    • /
    • 2008
  • Nowadays, a lot of related data obtained from these research could be given a new present meaning to accomplish the original purpose of the whole research as a human genome project. In such a thread, construction of gene expression analysis system and a basis rank analysis system is being watched newly. Recently, being identified fact that particular sub-class of tumor be related with particular chromosome, microarray started to be used in diagnosis field by doing cancer classification and predication based on gene expression information. In this thesis, we used cDNA microarrays of 3840 genes obtained from neuronal differentiation experiment of cortical stem cells on white mouse with cancer, created system that can extract informative gene list through normalization separately and proposed combination method for selecting more significant genes. And possibility of proposed system and method is verified through experiment. That result is that PC-ED combination represent 98.74% accurate and 0.04% MSE, which show that it improve classification performance than case to experiment after generating gene list using single similarity scale.

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization (mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.683-696
    • /
    • 2020
  • The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

Classification of Midinfrared Spectra of Colon Cancer Tissue Using a Convolutional Neural Network

  • Kim, In Gyoung;Lee, Changho;Kim, Hyeon Sik;Lim, Sung Chul;Ahn, Jae Sung
    • Current Optics and Photonics
    • /
    • v.6 no.1
    • /
    • pp.92-103
    • /
    • 2022
  • The development of midinfrared (mid-IR) quantum cascade lasers (QCLs) has enabled rapid high-contrast measurement of the mid-IR spectra of biological tissues. Several studies have compared the differences between the mid-IR spectra of colon cancer and noncancerous colon tissues. Most mid-IR spectrum classification studies have been proposed as machine-learning-based algorithms, but this results in deviations depending on the initial data and threshold values. We aim to develop a process for classifying colon cancer and noncancerous colon tissues through a deep-learning-based convolutional-neural-network (CNN) model. First, we image the midinfrared spectrum for the CNN model, an image-based deep-learning (DL) algorithm. Then, it is trained with the CNN algorithm and the classification ratio is evaluated using the test data. When the tissue microarray (TMA) and routine pathological slide are tested, the ML-based support-vector-machine (SVM) model produces biased results, whereas we confirm that the CNN model classifies colon cancer and noncancerous colon tissues. These results demonstrate that the CNN model using midinfrared-spectrum images is effective at classifying colon cancer tissue and noncancerous colon tissue, and not only submillimeter-sized TMA but also routine colon cancer tissue samples a few tens of millimeters in size.

Rank-based Multiclass Gene Selection for Cancer Classification with Naive Bayes Classifiers based on Gene Expression Profiles (나이브 베이스 분류기를 이용한 유전발현 데이타기반 암 분류를 위한 순위기반 다중클래스 유전자 선택)

  • Hong, Jin-Hyuk;Cho, Sung-Bae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.8
    • /
    • pp.372-377
    • /
    • 2008
  • Multiclass cancer classification has been actively investigated based on gene expression profiles, where it determines the type of cancer by analyzing the large amount of gene expression data collected by the DNA microarray technology. Since gene expression data include many genes not related to a target cancer, it is required to select informative genes in order to obtain highly accurate classification. Conventional rank-based gene selection methods often use ideal marker genes basically devised for binary classification, so it is difficult to directly apply them to multiclass classification. In this paper, we propose a novel method for multiclass gene selection, which does not use ideal marker genes but directly analyzes the distribution of gene expression. It measures the class-discriminability by discretizing gene expression levels into several regions and analyzing the frequency of training samples for each region, and then classifies samples by using the naive Bayes classifier. We have demonstrated the usefulness of the proposed method for various representative benchmark datasets of multiclass cancer classification.

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data (유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.463-478
    • /
    • 2008
  • Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

Classification Prediction Error Estimation System of Microarray for a Comparison of Resampling Methods Based on Multi-Layer Perceptron (다층퍼셉트론 기반 리 샘플링 방법 비교를 위한 마이크로어레이 분류 예측 에러 추정 시스템)

  • Park, Su-Young;Jeong, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.2
    • /
    • pp.534-539
    • /
    • 2010
  • In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to build classifiers: a significant gene selection, model selection and prediction assessment. In the paper, with a focus on prediction assessment, we normalize microarray data with quantile-normalization methods that adjust quartile of all slide equally and then design a system comparing several methods to estimate 'true' prediction error of a prediction model in the presence of feature selection and compare and analyze a prediction error of them. LOOCV generally performs very well with small MSE and bias, the split sample method and 2-fold CV perform with small sample size very pooly. For computationally burdensome analyses, 10-fold CV may be preferable to LOOCV.

Gene Selection using Principal Component Analysis for Molecular classification (Principal Component Analysis를 이용한 Gene Selection)

  • Lim Soo-Hong;Sohn Kirack;Hong Sung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.259-261
    • /
    • 2005
  • 수천개의 Gene Expression Measurement를 생성해 내는 DNA Microarray 연구는 조직과 세포의 표본으로부터 진단에 유용한 Gene Expression 정보를 모으게 된다. 이런 종류의 Data를 분석하기 위하여 SVM(Support Vector Machine)을 사용한 새로운 방법이 연구되어왔다. 본 논문에서는 Gene Expression Data에 대한 고유벡터(Eigen Vector)를 이용하여 SVM의 성능을 향상시키고 질병진단에 유용한 Gene을 찾아 내는 알고리즘을 기술한다. 고유벡터를 통하여 Gene을 선택적으로 SVM Learning에 참가 시키고 분류의 결과를 통하여 추가된 Gene이 질병 진단에 미치는 영향력을 알아냄으로써 질병에 대한 Gene 역할을 파악 하는데 활용할 수 있다.

  • PDF