• Title/Summary/Keyword: Informative genes

Search Result 38, Processing Time 0.025 seconds

Informative Gene Selection Method in Tumor Classification

  • Lee, Hyosoo;Park, Jong Hoon
    • Genomics & Informatics
    • /
    • v.2 no.1
    • /
    • pp.19-29
    • /
    • 2004
  • Gene expression profiles may offer more information than morphology and provide an alternative to morphology- based tumor classification systems. Informative gene selection is finding gene subsets that are able to discriminate between tumor types, and may have clear biological interpretation. Gene selection is a fundamental issue in gene expression based tumor classification. In this report, techniques for selecting informative genes are illustrated and supervised shaving introduced as a gene selection method in the place of a clustering algorithm. The supervised shaving method showed good performance in gene selection and classification, even though it is a clustering algorithm. Almost selected genes are related to leukemia disease. The expression profiles of 3051 genes were analyzed in 27 acute lymphoblastic leukemia and 11 myeloid leukemia samples. Through these examples, the supervised shaving method has been shown to produce biologically significant genes of more than $94\%$ accuracy of classification. In this report, SVM has also been shown to be a practicable method for gene expression-based classification.

An Efficient Functional Analysis Method for Micro-array Data Using Gene Ontology

  • Hong, Dong-Wan;Lee, Jong-Keun;Park, Sung-Soo;Hong, Sang-Kyoon;Yoon, Jee-Hee
    • Journal of Information Processing Systems
    • /
    • v.3 no.1
    • /
    • pp.38-42
    • /
    • 2007
  • Microarray data includes tens of thousands of gene expressions simultaneously, so it can be effectively used in identifying the phenotypes of diseases. However, the retrieval of functional information from a large corpus of gene expression data is still a time-consuming task. In this paper, we propose an efficient method for identifying functional categories of differentially expressed genes from a micro-array experiment by using Gene Ontology (GO). Our method is as follows: (1) The expression data set is first filtered to include only genes with mean expression values that differ by at least 3-fold between the two groups. (2) The genes are then ranked based on the t-statistics. The 100 most highly ranked genes are selected as informative genes. (3) The t-value of each informative gene is imposed as a score on the associated GO terms. High-scoring GO terms are then listed with their associated genes and represent the functional category information of the micro-array experiment. A system called HMDA (Hallym Micro-array Data analysis) is implemented on publicly available micro-array data sets and validated. Our results were also compared with the original analysis.

Finding Informative Genes From Microarray Gene Expression Data Using FIGER-test

  • Choi, Kyoung-Oak;Chung, Hwan-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.707-711
    • /
    • 2007
  • Microarray gene expression data is believed to show the functions of living organism through the gene expression values. We have studied a method to get the informative genes from the microarray gene expression data. There are several ways for this. In recent researches to get more sophisticated and detailed results, it has used the intelligence information theory like fuzzy theory. Some methods are to add fudge factors to the significance test for more refined results. In this paper, we suggest a method to get informative genes from microarray gene expression data. We combined the difference of means between two groups and the fuzzy membership degree which reflects the variance of the gene expression data. We have called our significance test the Fuzzy Information method for Gene Expression data(FIGER). The FIGER calculates FIGER variation ratio and FIGER membership degree to show how strongly each object belongs to the each group and then it results in the significance degree of each gene. The FIGER is focused on the variation and distribution of the data set to adjust the significance level. Out simulation shows that the FIGER-test is an effective and useful significance test.

Analysis of Partial cDNA Sequence from Human Fetal Liver

  • Kim, Jae-Wha;Song, Jae-Chan;Lee, In-Ae;Lee, Young-Hee;Nam, Myoung-Soo;Hahn, Yoon-Soo;Chung, Jae-Hoon;Choe, In-Seong
    • BMB Reports
    • /
    • v.28 no.5
    • /
    • pp.402-407
    • /
    • 1995
  • Single-run Partial cDNA sequencing was conducted on 1,592 randomly selected human fetal liver cDNA clones of Korean origin to isolate novel genes related to liver functions. Each partial cDNA sequence determined was analyzed by comparing it with the databases. GenBank, Protein Information Resource (PIR) and SWISS-PROT Protein Sequence Data Bank. From a set of 1.592 cDNA clones reported here, 1,433 (90.0% of the total) were informative cDNA sequences. The other 159 clones were identified as DNA sequences which had originated from the cloning vector. Among 1,433 informative partial cDNA sequences, 851 (59.3%) clones were revealed to be identical to known human genes. These known genes have been classified into 225 different kinds of genes. In addition, 340 clones (23.7%) showed various degrees of homology to previously known human genes. Ninety four (6.6%) clones contained various repeated sequences. Twenty four (1.7%) partial cDNA sequences were found to have considerable homology to known genes from evolutionarily distant organism such as yeast, rice, Arabidopsis, mouse and rat, based on database matches, whereas 124 (8.7%) had no Significant matches. Human homologues to functionally characterized genes from different organisms could be classified as candidates for novel human genes of similar functions. Information from the partial cDNA sequences in this study may facilitate the analysis of genes expressed in human fetal liver.

  • PDF

Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach (2 단계 접근법을 통한 통합 마이크로어레이 데이타의 분류기 생성)

  • Yoon, Young-Mi;Lee, Jong-Chan;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.46-58
    • /
    • 2007
  • Since microarray data acquire tens of thousands of gene expression values simultaneously, they could be very useful in identifying the phenotypes of diseases. However, the results of analyzing several microarray datasets which were independently carried out with the same biological objectives, could turn out to be different. One of the main reasons is attributable to the limited number of samples involved in one microarry experiment. In order to increase the classification accuracy, it is desirable to augment the sample size by integrating and maximizing the use of independently-conducted microarray datasets. In this paper, we propose a novel two-stage approach which firstly integrates individual microarray datasets to overcome the problem caused by limited number of samples, and identifies informative genes, secondly builds a classifier using only the informative genes. The classifier from large samples by integrating independent microarray datasets achieves high accuracy up to 24.19% increase as against other comparison methods, sensitivity, and specificity on independent test sample dataset.

The Design Of Microarray Classification System Using Combination Of Significant Gene Selection Method Based On Normalization. (표준화 기반 유의한 유전자 선택 방법 조합을 이용한 마이크로어레이 분류 시스템 설계)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2259-2264
    • /
    • 2008
  • Significant genes are defined as genes in which the expression level characterizes a specific experimental condition. Such genes in which the expression levels differ significantly between different groups are highly informative relevant to the studied phenomenon. In this paper, first the system can detect informative genes by similarity scale combination method being proposed in this paper after normalizing data with methods that are the most widely used among several normalization methods proposed the while. And it compare and analyze a performance of each of normalization methods with multi-perceptron neural network layer. The Result classifying in Multi-Perceptron neural network classifier for selected 200 genes using combination of PC(Pearson correlation coefficient) and ED(Euclidean distance coefficient) after Lowess normalization represented the improved classification performance of 98.84%.

Analysis of allele-specific expression using RNA-seq of the Korean native pig and Landrace reciprocal cross

  • Ahn, Byeongyong;Choi, Min-Kyeung;Yum, Joori;Cho, In-Cheol;Kim, Jin-Hoi;Park, Chankyu
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.12
    • /
    • pp.1816-1825
    • /
    • 2019
  • Objective: We tried to analyze allele-specific expression in the pig neocortex using bioinformatic analysis of high-throughput sequencing results from the parental genomes and offspring transcriptomes from reciprocal crosses between Korean Native and Landrace pigs. Methods: We carried out sequencing of parental genomes and offspring transcriptomes using next generation sequencing. We subsequently carried out genome scale identification of single nucleotide polymorphisms (SNPs) in two different ways using either individual genome mapping or joint genome mapping of the same breed parents that were used for the reciprocal crosses. Using parent-specific SNPs, allele-specifically expressed genes were analyzed. Results: Because of the low genome coverage (${\sim}4{\times}$) of the sequencing results, most SNPs were non-informative for parental lineage determination of the expressed alleles in the offspring and were thus excluded from our analysis. Consequently, 436 SNPs covering 336 genes were applicable to measure the imbalanced expression of paternal alleles in the offspring. By calculating the read ratios of parental alleles in the offspring, we identified seven genes showing allele-biased expression (p<0.05) including three previously reported and four newly identified genes in this study. Conclusion: The newly identified allele-specifically expressing genes in the neocortex of pigs should contribute to improving our knowledge on genomic imprinting in pigs. To our knowledge, this is the first study of allelic imbalance using high throughput analysis of both parental genomes and offspring transcriptomes of the reciprocal cross in outbred animals. Our study also showed the effect of the number of informative animals on the genome level investigation of allele-specific expression using RNA-seq analysis in livestock species.

Detection of Neural Fates from Random Differentiation : Application of Support Vector MachineMin

  • Lee, Min-Su;Ahn, Jeong-Hyuck;Park, Woong-Yang
    • Genomics & Informatics
    • /
    • v.5 no.1
    • /
    • pp.1-5
    • /
    • 2007
  • Embryonic stem cells can be differentiated into various types of cells, requiring a tight regulation of transcription. Biomarkers related to each lineage of cells are used to guide the differentiation into neural or any other fates. In previous experiments, we reported the guided differentiation (GD)-specific genes by comparing profiles of random differentiation (RD). Interestingly 68% of differentially expressed genes in GD overlap with that of RD, which makes it difficult for us to separate the lineages by examining several markers. In this paper, we design a prediction model to identify the differentiation into neural fates from any other lineage. From the profiles of 11,376 genes, 203 differentially expressed genes between neural and random differentiation were selected by random variance T-test with 95% confidence and 5% false discovery rate. Based on support vector machine algorithm, we could select 79 marker genes from the 203 informative genes to construct the optimal prediction model. Here we propose a prediction model for the prediction of neural fates from random differentiation which is constructed with a perfect accuracy.

Molecular phylogenetic relationship of the family Colchicaceae (Liliales)

  • Thi, Nguyen Pham Anh;Kim, Jung-Sung;Kim, Joo-Hwan
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2012.05a
    • /
    • pp.19-19
    • /
    • 2012
  • The Colchicaceae comprising 250 species and 15-19 genera of rhizomatous or cormous perennials, the moderate sized family in Liliales, distributes widely through the temperate and tropical areas of Africa, Asia and North America. The division of two subfamilies in Colchicaceae is still unclear because of different results in previous studies. Moreover, sister taxa of this family has not been determined. In genus level, it was uncertain that whether expand circumscription of three genera of Colchicum, Gloriosa, and Wurmbea which are include Androcymbium, Littonia and Onixotis, respectively, is reasonable or not. In this study, three coding genes of atpB, matK and rbcL were analyzed to reconstruct phylogenetic relationship of Colchicaceae and both of maximum parsimony (MP) and Bayesian analysis were conducted. Among three genes, matK region was most variable and provided more parsimony-informative sites, whereas the atpB and rbcL regions were similar in the variation and number of informative characters. Monophyly of Colchicaceae was strongly supported and it was divided into two subfamilies (Wurmbeoideae and Uvulariodeae). Uvularia-Disporum clade, comprises the subfamily Uvularioideae, is a sister of the rest Colchicaceae and subsequently differentiated Burchardia was a sister within subfamily Wurmbeoideae. Burchadia was used to be supposed to be a sister of the family in the previous studies. It was clear the monophyly and phylogenetic relationship among six tribes sensu Vinnersten and Manning (2007) within the family. In addition, the expanded circumscription of three genera was also strongly supported; Colchicum-Androcymbium (BP99), Wurmbea-Onixotis (BP100), and Littonia-Gloriosa (BP100). Here, we propose a re-circumscription among taxa of Colchicaceae.

  • PDF

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.