• Title/Summary/Keyword: Gene selection

Search Result 867, Processing Time 0.032 seconds

Significant Gene Selection Using Integrated Microarray Data Set with Batch Effect

  • Kim Ki-Yeol;Chung Hyun-Cheol;Jeung Hei-Cheul;Shin Ji-Hye;Kim Tae-Soo;Rha Sun-Young
    • Genomics & Informatics
    • /
    • v.4 no.3
    • /
    • pp.110-117
    • /
    • 2006
  • In microarray technology, many diverse experimental features can cause biases including RNA sources, microarray production or different platforms, diverse sample processing and various experiment protocols. These systematic effects cause a substantial obstacle in the analysis of microarray data. When such data sets derived from different experimental processes were used, the analysis result was almost inconsistent and it is not reliable. Therefore, one of the most pressing challenges in the microarray field is how to combine data that comes from two different groups. As the novel trial to integrate two data sets with batch effect, we simply applied standardization to microarray data before the significant gene selection. In the gene selection step, we used new defined measure that considers the distance between a gene and an ideal gene as well as the between-slide and within-slide variations. Also we discussed the association of biological functions and different expression patterns in selected discriminative gene set. As a result, we could confirm that batch effect was minimized by standardization and the selected genes from the standardized data included various expression pattems and the significant biological functions.

Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm (유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색)

  • 박찬호;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.525-536
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism, measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify disease with gene expression profile. Because all genes are not related to disease, it is needed to select related genes that is called feature selection, and it is needed to classify selected genes properly. This paper Proposes GA based method for searching optimal ensemble of feature-classifier pairs that are composed with seven feature selection methods based on correlation, similarity, and information theory, and six representative classifiers. In experimental results with leave-one-out cross validation on two gene expression Profiles related to cancers, we can find ensembles that produce much superior to all individual feature-classifier fairs for Lymphoma dataset and Colon dataset.

Development of Gene-based Markers for the Allelic Selection of the Restorer-of-fertility Gene, Rfo, in Radish (Raphanus sativus)

  • Kim, Sunggil;Lim, Heerae;Cho, Kang-Hee;Park, Pue Hee;Park, Suhyung;Sung, Soon-Kee;Oh, Daegeun;Kim, Ki-Taek
    • Korean Journal of Breeding Science
    • /
    • v.41 no.3
    • /
    • pp.194-204
    • /
    • 2009
  • Cytoplasmic male sterility (CMS) and fertility restoration have been utilized as valuable tools for $F_1$-hybrid seed production in many crops despite laborious breeding processes. Molecular markers for the selection of CMS-related genes help reduce the expenses and breeding times. A previously reported genomic region containing the Ppr-B gene, which is responsible for restoration of fertility and corresponds to the Rfo locus, was used to develop gene-based or so-called "functional" markers for allelic selection of the restorer-of-fertility gene (Rfo) in $F_1$-hybrid breeding of radish (Raphanus sativus L.) Polymorphic sequences among Rfo alleles of diverse breeding lines of radish were examined by sequencing the Ppr-B alleles. However, presence of Ppr-B homolog, designated as Ppr-D, interferes on specific PCR amplification of Ppr-B in certain breeding lines. The organization of Ppr-D, resolved by genome walking, revealed extended homology with Ppr-B even in the promoter region. Interestingly, PCR amplification of Ppr-D was repeatedly unsuccessful in certain breeding lines implying the lack of Ppr-D in these radishes. Ppr-B could only be successfully amplified for analysis through designing primers based on the sequences unique to Ppr-B that exclude interference from Ppr-D gene. Four variants of Rfo alleles were identified from 20 breeding lines. A combination of three molecular markers was developed in order to genotype the Rfo locus based on polymorphisms among four different variants. These markers will be useful in facilitating $F_1$-hybrid cultivar development in radish.

Effects of Artificial and Natural Selection on Walking Behavior in Drosophila melanogaster (초파리의 보행행동에 관한 인위도태와 자연도태에 의한 유전적 효과)

  • 주종길;이현화
    • The Korean Journal of Zoology
    • /
    • v.26 no.2
    • /
    • pp.95-106
    • /
    • 1983
  • Selections for rapid and slow walking behavior were carried out with the populations, drived from Oregon-R and lethal free strain of Drosophila melanogaster. The behavior was measured by means of connected test-tube apparatus. The populations responded effectively to the artificial selection, and it reached the selection plateau after 7 generations. The realized heritability for the first 10 generations was estimated to be about $9\\sim14%$ for the rapid walking behavior, and those for slow walking behavior was about $11\\sim16%$. The results of hybridization analysis between selected populations at generations 8 and 10 indicated that some polygenes showing a slow walking behavior were partially dominant over polygenes controlled rapid trait. The populations selected for rapid and slow walking behavior were relaxed after 10 generations of selection. The response to natural selection of rapid population was completely returned to their neutral states after only 5 generations. Such phenomena would be explained by the genetic homeostasis resulted from an action of natural selection. However, the slow population did not make any difference from walking scores of their original artificial selection. It seems reasonable to assume that the slow walking behavior was possibly controlled by a major gene.

  • PDF

Optimal Design for Marker-assisted Gene Pyramiding in Cross Population

  • Xu, L.Y.;Zhao, F.P.;Sheng, X.H.;Ren, H.X.;Zhang, L.;Wei, C.H.;Du, L.X.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.25 no.6
    • /
    • pp.772-784
    • /
    • 2012
  • Marker-assisted gene pyramiding aims to produce individuals with superior economic traits according to the optimal breeding scheme which involves selecting a series of favorite target alleles after cross of base populations and pyramiding them into a single genotype. Inspired by the science of evolutionary computation, we used the metaphor of hill-climbing to model the dynamic behavior of gene pyramiding. In consideration of the traditional cross program of animals along with the features of animal segregating populations, four types of cross programs and two types of selection strategies for gene pyramiding are performed from a practical perspective. Two population cross for pyramiding two genes (denoted II), three population cascading cross for pyramiding three genes(denoted III), four population symmetry (denoted IIII-S) and cascading cross for pyramiding four genes (denoted IIII-C), and various schemes (denoted cross program-A-E) are designed for each cross program given different levels of initial favorite allele frequencies, base population sizes and trait heritabilities. The process of gene pyramiding breeding for various schemes are simulated and compared based on the population hamming distance, average superior genotype frequencies and average phenotypic values. By simulation, the results show that the larger base population size and the higher the initial favorite allele frequency the higher the efficiency of gene pyramiding. Parents cross order is shown to be the most important factor in a cascading cross, but has no significant influence on the symmetric cross. The results also show that genotypic selection strategy is superior to phenotypic selection in accelerating gene pyramiding. Moreover, the method and corresponding software was used to compare different cross schemes and selection strategies.

A Node2Vec-Based Gene Expression Image Representation Method for Effectively Predicting Cancer Prognosis (암 예후를 효과적으로 예측하기 위한 Node2Vec 기반의 유전자 발현량 이미지 표현기법)

  • Choi, Jonghwan;Park, Sanghyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.10
    • /
    • pp.397-402
    • /
    • 2019
  • Accurately predicting cancer prognosis to provide appropriate treatment strategies for patients is one of the critical challenges in bioinformatics. Many researches have suggested machine learning models to predict patients' outcomes based on their gene expression data. Gene expression data is high-dimensional numerical data containing about 17,000 genes, so traditional researches used feature selection or dimensionality reduction approaches to elevate the performance of prognostic prediction models. These approaches, however, have an issue of making it difficult for the predictive models to grasp any biological interaction between the selected genes because feature selection and model training stages are performed independently. In this paper, we propose a novel two-dimensional image formatting approach for gene expression data to achieve feature selection and prognostic prediction effectively. Node2Vec is exploited to integrate biological interaction network and gene expression data and a convolutional neural network learns the integrated two-dimensional gene expression image data and predicts cancer prognosis. We evaluated our proposed model through double cross-validation and confirmed superior prognostic prediction accuracy to traditional machine learning models based on raw gene expression data. As our proposed approach is able to improve prediction models without loss of information caused by feature selection steps, we expect this will contribute to development of personalized medicine.

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<

An enhanced feature selection filter for classification of microarray cancer data

  • Mazumder, Dilwar Hussain;Veilumuthu, Ramachandran
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.358-370
    • /
    • 2019
  • The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.

Expression of Porcine Epidemic Diarrhea Virus Spike Gene in Transgenic Carrot Plants

  • Kim, Young-Sook;Kwon, Tae-Ho;Yang, Moon-Sik
    • Plant Resources
    • /
    • v.6 no.2
    • /
    • pp.108-113
    • /
    • 2003
  • This study was carried out to obtain basic information for possibility of oral vaccine in carrot using Agrobacteruim -mediated transformation system. The epitope region of porcine epidemic diarrhea virus (PEDV) spike gene which is classified as a member of the Coronaviridae and causes an acute enteritis in pigs was successfully expressed in carrot (Daucus carota) using the Agrobacterium-mediated transformation system. Hypocotyl segments of in vitro germinated plantlets were infected with Agrobacteriun tumefaciens LBA 4404 harboring PEDV spike gene. Embryogenic callus (EC) was induced on MS selection medium with 1 mg/L 2,4-D, 50 mg/L kanamycin and 300 mg/L cefotaxime after 45 days of culture. Subcultured ECs on MS selection medium without 2,4-D were converted to somatic embryos (SE) of various stage; globular, heart and torpedo stage. Putative transgenic embryos were selected on MS medium with 50 mg/L kanamycin and 300 mg/L cefotaxime. Regenerated plantlets from transformed SE were induced on MS medium containing 50 mg/L kanamycin after 30 days of culture. Genomic PCR confirmed the integration of PEDV spike gene into nuclear genome of carrot and northern blot analysis demonstrated the expression of PEDV spike gene in transgenic carrot.

  • PDF

Genetic Diversity and Clustering of the Rhoptry Associated Protein-1 of Plasmodium knowlesi from Peninsular Malaysia and Malaysian Borneo

  • Ummi Wahidah Azlan;Yee Ling Lau;Mun Yik Fong
    • Parasites, Hosts and Diseases
    • /
    • v.60 no.6
    • /
    • pp.393-400
    • /
    • 2022
  • Human infection with simian malaria Plasmodium knowlesi is a cause for concern in Southeast Asian countries, especially in Malaysia. A previous study on Peninsular Malaysia P. knowlesi rhoptry associated protein-1 (PkRAP1) gene has discovered the existence of dimorphism. In this study, genetic analysis of PkRAP1 in a larger number of P. knowlesi samples from Malaysian Borneo was conducted. The PkRAP1 of these P. knowlesi isolates was PCR-amplified and sequenced. The newly obtained PkRAP1 gene sequences (n=34) were combined with those from the previous study (n=26) and analysed for polymorphism and natural selection. Sequence analysis revealed a higher genetic diversity of PkRAP1 compared to the previous study. Exon II of the gene had higher diversity (π=0.0172) than exon I (π=0.0128). The diversity of the total coding region (π=0.0167) was much higher than those of RAP1 orthologues such as PfRAP-1 (π=0.0041) and PvRAP1 (π=0.00088). Z-test results indicated that the gene was under purifying selection. Phylogenetic tree and haplotype network showed distinct clustering of Peninsular Malaysia and Malaysian Borneo PkRAP1 haplotypes. This geographical-based clustering of PkRAP1 haplotypes provides further evidence of the dimorphism of the gene and possible existence of 2 distinct P. knowlesi lineages in Malaysia.