• Title/Summary/Keyword: gene information

Search Result 1,630, Processing Time 0.031 seconds

An information-theoretical analysis of gene nucleotide sequence structuredness for a selection of aging and cancer-related genes

  • Blokh, David;Gitarts, Joseph;Stambler, Ilia
    • Genomics & Informatics
    • /
    • v.18 no.4
    • /
    • pp.41.1-41.8
    • /
    • 2020
  • We provide an algorithm for the construction and analysis of autocorrelation (information) functions of gene nucleotide sequences. As a measure of correlation between discrete random variables, we use normalized mutual information. The information functions are indicative of the degree of structuredness of gene sequences. We construct the information functions for selected gene sequences. We find a significant difference between information functions of genes of different types. We hypothesize that the features of information functions of gene nucleotide sequences are related to phenotypes of these genes.

Gene Selection using Principal Component Analysis for Molecular classification (Principal Component Analysis를 이용한 Gene Selection)

  • Lim Soo-Hong;Sohn Kirack;Hong Sung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.259-261
    • /
    • 2005
  • 수천개의 Gene Expression Measurement를 생성해 내는 DNA Microarray 연구는 조직과 세포의 표본으로부터 진단에 유용한 Gene Expression 정보를 모으게 된다. 이런 종류의 Data를 분석하기 위하여 SVM(Support Vector Machine)을 사용한 새로운 방법이 연구되어왔다. 본 논문에서는 Gene Expression Data에 대한 고유벡터(Eigen Vector)를 이용하여 SVM의 성능을 향상시키고 질병진단에 유용한 Gene을 찾아 내는 알고리즘을 기술한다. 고유벡터를 통하여 Gene을 선택적으로 SVM Learning에 참가 시키고 분류의 결과를 통하여 추가된 Gene이 질병 진단에 미치는 영향력을 알아냄으로써 질병에 대한 Gene 역할을 파악 하는데 활용할 수 있다.

  • PDF

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

  • Huynh, Phuoc-Hai;Nguyen, Van Hoa;Do, Thanh-Nghi
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.14-20
    • /
    • 2019
  • Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

Integration Scheme of Gene Information based on Anatomical Structure (해부학적 구조를 이용한 유전자 정보 통합 기법)

  • Yang, Gi-Chul
    • Journal of Digital Convergence
    • /
    • v.13 no.2
    • /
    • pp.153-158
    • /
    • 2015
  • Biologists are pursuing genetics related researches that can provide the core information to understand a certain cancer or inherent diseases. However, biological experimentations can produce different results by the difference of various elements or environments at the time of experimentation and/or difference of interpretations. Therefore, currently existing research results can possibly provide different information. These inconsistency can be found through integration of gene information. Biologists can save their time and efforts to find certain gene information if the gene information is integrated without inconsistency. An efficient gene integration and augmentation scheme of gene information generated through different researches is introduced in this paper.

Sweet spot search of multi peak beam using Genetic Algorithm (Genetic Algorithm을 이용한 멀티 피크 빔의 최적방향탐색)

  • Hwang Jong Woo;Lim Sung Jin;Eom Ki Hwan;Sato Yoichi
    • Proceedings of the IEEK Conference
    • /
    • 2004.06a
    • /
    • pp.301-304
    • /
    • 2004
  • In this paper, we propose a method to find the optimal direction of the multi beam between each station on the point-to-point link by genetic algorithm. In the proposed method, maximum value in optimal direction on each station is used as a fitness function. The beam of millimeter wave generates a lot of multi-peak because of much influence of noise. About each gene, we simulated this method using 16bit, 32bit, and 32bit split algorithm. 32bit split uses 16bit gene information. Each antenna makes 32bit gene information by adding gene information of two antennas having 16bit gene. Through the proposed method, we could have gotten a good output without 32bit gene information.

  • PDF

A modified partial least squares regression for the analysis of gene expression data with survival information

  • Lee, So-Yoon;Huh, Myung-Hoe;Park, Mira
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1151-1160
    • /
    • 2014
  • In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.

Xenie: Integration of Human 'gene to function'information in human readable & machine usable way

  • Ahn, Tae-Jin
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.53-55
    • /
    • 2000
  • Xenie is the JAVA application software that integrates and represents 'gene to function'information of human gene. Xenie extracts data from several heterogeneous molecular biology databases and provides integrated information in human readable and machine usable way. We defined 7 semantic frame classes (Gene, Transcript, Polypeptide, Protein_complex, Isotype, Functional_object, and Cell) as a common schema for storing and integrating gene to function information and relationship. Each of 7 semantic frame classes has data fields that are supposed to store biological data like gene symbol, disease information, cofactors, and inhibitors, etc. By using these semantic classes, Xenie can show how many transcripts and polypeptide has been known and what the function of gene products is in General. In detail, Xenie provides functional information of given human gene in the fields of semantic objects that are storing integrated data from several databases (Brenda, GDB, Genecards, HGMD, HUGO, LocusLink, OMIM, PIR, and SWISS-PROT). Although Xenie provide fully readable form of XML document for human researchers, the main goal of Xenie system is providing integrated data for other bioinformatic application softwares. Technically, Xenie provides two kinds of output format. One is JAVA persistent object, the other is XML document, both of them have been known as the most favorite solution for data exchange. Additionally, UML designs of Xenie and DTD for 7 semantic frame classes are available for easy data binding to other bioinformatic application systems. Hopefully, Xenie's output can provide more detailed and integrated information in several bioinformatic systems like Gene chip, 2D gel, biopathway related systems. Furthermore, through data integration, Xenie can also make a way for other bioiformatic systems to ask 'function based query'that was originally impossible to be answered because of separatly stored data in heterogeneous databases.

  • PDF

Reverse Engineering of a Gene Regulatory Network from Time-Series Data Using Mutual Information

  • Barman, Shohag;Kwon, Yung-Keun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.849-852
    • /
    • 2014
  • Reverse engineering of gene regulatory network is a challenging task in computational biology. To detect a regulatory relationship among genes from time series data is called reverse engineering. Reverse engineering helps to discover the architecture of the underlying gene regulatory network. Besides, it insights into the disease process, biological process and drug discovery. There are many statistical approaches available for reverse engineering of gene regulatory network. In our paper, we propose pairwise mutual information for the reverse engineering of a gene regulatory network from time series data. Firstly, we create random boolean networks by the well-known $Erd{\ddot{o}}s-R{\acute{e}}nyi$ model. Secondly, we generate artificial time series data from that network. Then, we calculate pairwise mutual information for predicting the network. We implement of our system on java platform. To visualize the random boolean network graphically we use cytoscape plugins 2.8.0.

GO Guide : Browser & Query Translation for Biological Ontology (GO Guide : 생물학 온톨로지를 위한 브라우저 및 질의 변환)

  • Jung Jun-Won;Park Hyoung-Woo;Im Dong-Hhyuk;Lee Kang-Pyo;Kim Hyoung-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.3
    • /
    • pp.183-191
    • /
    • 2006
  • As genetic research is getting more active, data construction of genes are needed in the field of biology. Therefore, Gene Ontology Consortium has constructed genetic information by OWL, which is Ontology description language published by W3C. However, previous browsers for Gene Ontology only support simple searching mechanisms based on keyword, tree, and graph, but it is not able to search high quality information considering various relationships. In this paper, we suggest browsing technique which integratesvarious searching methods to support researchers who are doing actually experiment in biology field. Also, instead of typing a query, we propose querv generation technique which constructs query while browsing and query translation technique which translate generated query into SeRQL query It is convenient for user and enables user to obtain high quality information. And by this GO Guide browser, it has been shown that the information of Gene Ontology could be used efficiently.