DOI QR코드

DOI QR Code

Applying a modified AUC to gene ranking

  • Yu, Wenbao (Department of Statistics, Chonnam National University) ;
  • Chang, Yuan-Chin Ivan (Institute of Statistical Science, Academia Sinica) ;
  • Park, Eunsik (Department of Statistics, Chonnam National University)
  • Received : 2018.02.22
  • Accepted : 2018.04.16
  • Published : 2018.05.31

Abstract

High-throughput technologies enable the simultaneous evaluation of thousands of genes that could discriminate different subclasses of complex diseases. Ranking genes according to differential expression is an important screening step for follow-up analysis. Many statistical measures have been proposed for this purpose. A good ranked list should provide a stable rank (at least for top-ranked gene), and the top ranked genes should have a high power in differentiating different disease status. However, there is a lack of emphasis in the literature on ranking genes based on these two criteria simultaneously. To achieve the above two criteria simultaneously, we proposed to apply a previously reported metric, the modified area under the receiver operating characteristic cure, to gene ranking. The proposed ranking method is found to be promising in leading to a stable ranking list and good prediction performances of top ranked genes. The findings are illustrated through studies on both synthesized data and real microarray gene expression data. The proposed method is recommended for ranking genes or other biomarkers for high-dimensional omics studies.

Keywords

References

  1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, and Levine AJ (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In Proceedings of the National Academy of Sciences, 96, 6745-6750. https://doi.org/10.1073/pnas.96.12.6745
  2. Bamber D (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of Mathematical Psychology, 12, 387-415. https://doi.org/10.1016/0022-2496(75)90001-2
  3. Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), 57, 289-300.
  4. Boulesteix, AL and Slawski M (2009). Stability and aggregation of ranked gene lists, Briefings in Bioinformatics, 10, 556-568. https://doi.org/10.1093/bib/bbp034
  5. Cui X and Churchill GA (2003). Statistical tests for differential expression in cDNA microarray experiments, Genome Biology, 4, 210. https://doi.org/10.1186/gb-2003-4-4-210
  6. Cui X, Hwang JT, Qiu J, Blades NJ, and Churchill GA (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates, Biostatistics, 6, 59-75. https://doi.org/10.1093/biostatistics/kxh018
  7. De Alava E, Panizo A, Antonescu CR, Huvos AG, Pardo-Mindan FJ, Barr FG, and Ladanyi M (2000). Association of EWS-FLI1 type 1 fusion with lower proliferative rate in Ewing's sarcoma, The American Journal of Pathology, 156, 849-855. https://doi.org/10.1016/S0002-9440(10)64953-X
  8. Efron B, Tibshirani R, Storey JD, and Tusher V (2001). Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Association, 96, 1151-1160. https://doi.org/10.1198/016214501753382129
  9. Furlanello C, Serafini M, Merler S, and Jurman G (2003). Entropy-based gene ranking without selec- tion bias for the predictive classification of microarray data, BMC bioinformatics, 4, 54. https://doi.org/10.1186/1471-2105-4-54
  10. Jeffery IB, Higgins DG, and Culhane AC (2006). Comparison and evaluation of methods for generat- ing differentially expressed gene lists from microarray data, BMC Bioinformatics, 7, 359. https://doi.org/10.1186/1471-2105-7-359
  11. Joober R, Benkelfat C, Toulouse A, et al. (1999). Analysis of 14 CAG repeat-containing genes in schizophrenia, American Journal of Medical Genetics (Neuropsychiatric Genetics), 88, 694-699. https://doi.org/10.1002/(SICI)1096-8628(19991215)88:6<694::AID-AJMG20>3.0.CO;2-I
  12. Kuner R, Muley T, Meister M, et al. (2009). Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes, Lung Cancer, 63, 32-38. https://doi.org/10.1016/j.lungcan.2008.03.033
  13. Newton MA, Noueiry A, Sarkar D, and Ahlquist P (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method, Biostatistics, 5, 155-176. https://doi.org/10.1093/biostatistics/5.2.155
  14. Noma H and Matsui S (2013). Empirical Bayes ranking and selection methods via semiparametric hierarchical mixture models in microarray studies, Statistics in Medicine, 32, 1904-1916. https://doi.org/10.1002/sim.5718
  15. Noma H, Matsui S, Omori T, and Sato T (2010). Bayesian ranking and selection methods using hierarchical mixture models in microarray studies, Biostatistics, 11, 281-289. https://doi.org/10.1093/biostatistics/kxp047
  16. Pepe MS, Longton G, Anderson GL, and Schummer M (2003). Selecting differentially expressed genes from microarray experiments, Biometrics, 59, 133-142. https://doi.org/10.1111/1541-0420.00016
  17. Sindhwani V, Bhattacharya P, and Rakshit S (2001). Information theoretic feature crediting in multiclass support vector machines. In Proceedings of the First SIAM International Conference on Data Mining, 5-7.
  18. Smyth GK (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Statistical Applications in Genetics and Molecular Biology, 3, 3.
  19. Storey JD (2003). The positive false discovery rate: a Bayesian interpretation and the q-value, Annals of Statistics, 31, 2013-2035. https://doi.org/10.1214/aos/1074290335
  20. Tusher VG, Tibshirani R, and Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. In Proceedings of the National Academy of Sciences, 98, 5116- 5121. https://doi.org/10.1073/pnas.091062498
  21. Yu W, Chang YCI, and Park E (2014). A modified area under the ROC curve and its application to marker selection and classification, Journal of the Korean Statistical Society, 43, 161-175. https://doi.org/10.1016/j.jkss.2013.05.003
  22. Yu WB, Park E, and Chang YCI (2015). Comparison of paired ROC curves through a two-stage test, Journal of Biopharmaceutical Statistics, 25, 881-902. https://doi.org/10.1080/10543406.2014.920874