• Title/Summary/Keyword: Gene ranking

Search Result 22, Processing Time 0.026 seconds

Applying a modified AUC to gene ranking

  • Yu, Wenbao;Chang, Yuan-Chin Ivan;Park, Eunsik
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.307-319
    • /
    • 2018
  • High-throughput technologies enable the simultaneous evaluation of thousands of genes that could discriminate different subclasses of complex diseases. Ranking genes according to differential expression is an important screening step for follow-up analysis. Many statistical measures have been proposed for this purpose. A good ranked list should provide a stable rank (at least for top-ranked gene), and the top ranked genes should have a high power in differentiating different disease status. However, there is a lack of emphasis in the literature on ranking genes based on these two criteria simultaneously. To achieve the above two criteria simultaneously, we proposed to apply a previously reported metric, the modified area under the receiver operating characteristic cure, to gene ranking. The proposed ranking method is found to be promising in leading to a stable ranking list and good prediction performances of top ranked genes. The findings are illustrated through studies on both synthesized data and real microarray gene expression data. The proposed method is recommended for ranking genes or other biomarkers for high-dimensional omics studies.

Identifying Statistically Significant Gene-Sets by Gene Set Enrichment Analysis Using Fisher Criterion (Fisher Criterion을 이용한 Gene Set Enrichment Analysis 기반 유의 유전자 집합의 검출 방법 연구)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.4
    • /
    • pp.19-26
    • /
    • 2008
  • Gene set enrichment analysis (GSEA) is a computational method to identify statistically significant gene sets showing significant differences between two groups of microarray expression profiles and simultaneously uncover their biological meanings in an elegant way by employing gene annotation databases, such as Cytogenetic Band, KEGG pathways, gene ontology, and etc. For the gone set enrichment analysis, all the genes in a given dataset are first ordered by the signal-to-noise ratio between the groups and then further analyses are proceeded. Despite of its impressive results in several previous studies, however, gene ranking by the signal-to-noise ratio makes it difficult to consider highly up-regulated genes and highly down-regulated genes at the same time as the candidates of significant genes, which possibly reflect certain situations incurred in metabolic and signaling pathways. To deal with this problem, in this article, we investigate the gene set enrichment analysis method with Fisher criterion for gene ranking and also evaluate its effects in Leukemia related pathway analyses.

Evaluation of reference genes for RT-qPCR study in abalone Haliotis discus hannai during heavy metal overload stress

  • Lee, Sang Yoon;Nam, Yoon Kwon
    • Fisheries and Aquatic Sciences
    • /
    • v.19 no.4
    • /
    • pp.21.1-21.11
    • /
    • 2016
  • Background: The evaluation of suitable reference genes as normalization controls is a prerequisite requirement for launching quantitative reverse transcription-PCR (RT-qPCR)-based expression study. In order to select the stable reference genes in abalone Haliotis discus hannai tissues (gill and hepatopancreas) under heavy metal exposure conditions (Cu, Zn, and Cd), 12 potential candidate housekeeping genes were subjected to expression stability based on the comprehensive ranking while integrating four different statistical algorithms (geNorm, NormFinder, BestKeeper, and ${\Delta}CT$ method). Results: Expression stability in the gill subset was determined as RPL7 > RPL8 > ACTB > RPL3 > PPIB > RPL7A > EF1A > RPL4 > GAPDH > RPL5 > UBE2 > B-TU. On the other hand, the ranking in the subset for hepatopancreas was RPL7 > RPL3 > RPL8 > ACTB > RPL4 > EF1A > RPL5 > RPL7A > B-TU > UBE2 > PPIB > GAPDH. The pairwise variation assessed by the geNorm program indicates that two reference genes could be sufficient for accurate normalization in both gill and hepatopancreas subsets. Overall, both gill and hepatopancreas subsets recommended ribosomal protein genes (particularly RPL7) as stable references, whereas traditional housekeepers such as ${\beta}-tubulin$ (B-TU) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) genes were ranked as unstable genes. The validation of reference gene selection was confirmed with the quantitative assay of MT transcripts. Conclusions: The present analysis showed the importance of validating reference genes with multiple algorithmic approaches to select genes that are truly stable. Our results indicate that expression stability of a given reference gene could not always have consensus across tissue types. The data from this study could be a good guide for the future design of RT-qPCR studies with respect to metal regulation/detoxification and other related physiologies in this abalone species.

Comparative Statistic Module (CSM) for Significant Gene Selection

  • Kim, Young-Jin;Kim, Hyo-Mi;Kim, Sang-Bae;Park, Chan;Kimm, Kuchan;Koh, InSong
    • Genomics & Informatics
    • /
    • v.2 no.4
    • /
    • pp.180-183
    • /
    • 2004
  • Comparative Statistic Module(CSM) provides more reliable list of significant genes to genomics researchers by offering the commonly selected genes and a method of choice by calculating the rank of each statistical test based on the average ranking of common genes across the five statistical methods, i.e. t-test, Kruskal-Wallis (Wilcoxon signed rank) test, SAM, two sample multiple test, and Empirical Bayesian test. This statistical analysis module is implemented in Perl, and R languages.

Prediction of hub genes of Alzheimer's disease using a protein interaction network and functional enrichment analysis

  • Wee, Jia Jin;Kumar, Suresh
    • Genomics & Informatics
    • /
    • v.18 no.4
    • /
    • pp.39.1-39.8
    • /
    • 2020
  • Alzheimer's disease (AD) is a chronic, progressive brain disorder that slowly destroys affected individuals' memory and reasoning faculties, and consequently, their ability to perform the simplest tasks. This study investigated the hub genes of AD. Proteins interact with other proteins and non-protein molecules, and these interactions play an important role in understanding protein function. Computational methods are useful for understanding biological problems, in particular, network analyses of protein-protein interactions. Through a protein network analysis, we identified the following top 10 hub genes associated with AD: PTGER3, C3AR1, NPY, ADCY2, CXCL12, CCR5, MTNR1A, CNR2, GRM2, and CXCL8. Through gene enrichment, it was identified that most gene functions could be classified as integral to the plasma membrane, G-protein coupled receptor activity, and cell communication under gene ontology, as well as involvement in signal transduction pathways. Based on the convergent functional genomics ranking, the prioritized genes were NPY, CXCL12, CCR5, and CNR2.

Validation of housekeeping genes as candidate internal references for quantitative expression studies in healthy and nervous necrosis virus-infected seven-band grouper (Hyporthodus septemfasciatus)

  • Krishnan, Rahul;Qadiri, Syed Shariq Nazir;Kim, Jong-Oh;Kim, Jae-Ok;Oh, Myung-Joo
    • Fisheries and Aquatic Sciences
    • /
    • v.22 no.12
    • /
    • pp.28.1-28.8
    • /
    • 2019
  • Background: In the present study, we evaluated four commonly used housekeeping genes, viz., actin-β, elongation factor-1α (EF1α), acidic ribosomal protein (ARP), and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as internal references for quantitative analysis of immune genes in nervous necrosis virus (NNV)-infected seven-band grouper, Hyporthodus septemfasciatus. Methods: Expression profiles of the four genes were estimated in 12 tissues of healthy and infected seven-band grouper. Expression stability of the genes was calculated using the delta Ct method, BestKeeper, NormFinder, and geNorm algorithms. Consensus ranking was performed using RefFinder, and statistical analysis was done using GraphpadPrism 5.0. Results: Tissue-specific variations were observed in the four tested housekeeping genes of healthy and NNV-infected seven-band grouper. Fold change calculation for interferon-1 and Mx expression using the four housekeeping genes as internal references presented varied profiles for each tissue. EF1α and actin-β was the most stable expressed gene in tissues of healthy and NNV-infected seven-band grouper, respectively. Consensus ranking using RefFinder suggested EF1α as the least variable and highly stable gene in the healthy and infected animals. Conclusions: These results suggest that EF1α can be a fairly better internal reference in comparison to other tested genes in this study during the NNV infection process. This forms the pilot study on the validation of reference genes in Hyporthodus septemfasciatus, in the context of NNV infection.

Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression (효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화)

  • Kim, Jaehee;Kim, Taehoun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.389-399
    • /
    • 2013
  • This article introduces Gaussian process regression and shows its application with time-course microarray gene expression data. Gene screening for yeast cell cycle microarray expression data is accomplished with a ratio of log marginal likelihood that uses Gaussian process regression with a squared exponential covariance kernel function. Gaussian process regression fitting with each gene is done and shown with the nine top ranking genes. With the screened data the Gaussian model-based clustering is done and its silhouette values are calculated for cluster validity.

Re-Ranking Retrieval Model Using Similarity Transformation Based on Gene Algorithm (유전자 알고리즘 기반 유사도 변환을 이용한 순위 재조정 검색 모델)

  • 이재훈;이성주
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.331-334
    • /
    • 2005
  • 정보$\cdot$통신과학의 발달로 다양한 영역에서 수많은 정보들이 발생하고 있다. 그 결과 사용자의 요구에 무분별한 응답을 제시하는 검색 모델도 발생하였다. 본 논문은 정보들 사이의 유사도를 변환하고 순위를 재조정하여 더욱 적합한 정보를 상위 순위에 제시함으로써 사용자 요구에 더욱 적합한 정보를 획득할 수 있는 모델에 대해 연구하였다.

  • PDF

Assessment of Suitable Reference Genes for RT-qPCR Normalization with Developmental Samples in Pacific Abalone Haliotis discus hannai

  • Lee, Sang Yoon;Park, Choul-Ji;Nam, Yoon Kwon
    • Journal of Animal Reproduction and Biotechnology
    • /
    • v.34 no.4
    • /
    • pp.280-291
    • /
    • 2019
  • Potential utility of 14 candidate housekeeping genes as normalization reference for RT-qPCR analysis with developmental samples (fertilized eggs to late veliger larvae) in Pacific abalone Haliotis discus hannai was evaluated using four different statistical algorithms (geNorm, NormFinder, BestKeeper and comparative ΔCT method). Different algorithms identified different genes as the best candidates, and geometric mean-based final ranking from the most to the least stable expression was as follow: RPL5, RPL4, RPS18, RPL8, RPL7, UBE2, RPL7A, GAPDH, RPL36, PPIB, EF1A, ACTB and B-TU. The findings were further validated via relative quantification of metallothionein (MT) transcripts using the stable and unstable reference genes, and expression levels of MT were greatly influenced according to the choice of reference genes. In overall, our data suggest that RPL5 and RPS18, either singly or in combination, are appropriate for normalizing gene expression in developmental samples of this abalone species, whereas ACTB, B-TU and EF1A are less stable and not recommended. In addition, our findings propose that standard deviations in geometric ranking as well as geometric mean itself should also be taken into account for the final selection of reference gene(s). This study could be a useful basis to facilitate the generation of accurate and reliable RT-qPCR data with developmental samples in this abalone species.

Ranking Candidate Genes for the Biomarker Development in a Cancer Diagnostics

  • Kim, In-Young;Lee, Sun-Ho;Rha, Sun-Young;Kim, Byung-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.272-278
    • /
    • 2004
  • Recently, Pepe et al. (2003) employed the receiver operating characteristic (ROC) approach to rank candidate genes from a microarray experiment that can be used for the biomarker development with the ultimate purpose of the population screening of a cancer, In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. This design is referred to as a reference design or an indirect design. Ideally, this experiment produces n pairs of microarray data, where each pair consists of two sets of microarray data resulting from reference versus normal tissue and reference versus tumor tissue hybridizations. However, for certain individuals either normal tissue or tumor tissue is not large enough for the experimenter to extract enough RNA for conducting the microarray experiment, hence there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ 'normal only' and $n_3$ 'tumor only' data for the microarray experiment with n patients, where n=$n_1$+$n_2$+$n_3$. We refer to this data set as a mixed data set, as it contains a mix of fully observed and partially observed pair data. This mixed data set was actually observed in the microarray experiment based on human tissues, where human tissues were obtained during the surgical operations of cancer patients. Pepe et al. (2003) provide the rationale of using ROC approach based on two independent samples for ranking candidate gene instead of using t or Mann -Whitney statistics. We first modify ROC approach of ranking genes to a paired data set and further extend it to a mixed data set by taking a weighted average of two ROC values obtained by the paired data set and two independent data sets.

  • PDF