DOI QR코드

DOI QR Code

Cross platform classification of microarrays by rank comparison

  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • Received : 2014.12.15
  • Accepted : 2015.01.25
  • Published : 2015.03.31

Abstract

Mining the microarray data accumulated in the public data repositories can save experimental cost and time and provide valuable biomedical information. Big data analysis pooling multiple data sets increases statistical power, improves the reliability of the results, and reduces the specific bias of the individual study. However, integrating several data sets from different studies is needed to deal with many problems. In this study, I limited the focus to the cross platform classification that the platform of a testing sample is different from the platform of a training set, and suggested a simple classification method based on rank. This method is compared with the diagonal linear discriminant analysis, k nearest neighbor method and support vector machine using the cross platform real example data sets of two cancers.

Keywords

References

  1. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick. J.C., Sabet, H. et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503-511. https://doi.org/10.1038/35000501
  2. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W. et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29, 365-371. https://doi.org/10.1038/ng1201-365
  3. Chen, Q. R., Song, Y. K., Wei, J. S., Bilke, S., Asgharzadeh, S., Seeger, R. and Khan, J. (2008). An integrated cross-platform prognosis study on neuroblastoma patients. Genomics, 92, 195-203. https://doi.org/10.1016/j.ygeno.2008.05.014
  4. Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273-297.
  5. Diaz-Uriarte R. and Alvarez de Andres S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7, 3. https://doi.org/10.1186/1471-2105-7-3
  6. Dudoit, S., Fridlyand, J. and Speed, TP. (2002). Comparison of discriminant methods for the classification of tumors using gene expression data. Journal of American Statistical Association, 97, 77-87. https://doi.org/10.1198/016214502753479248
  7. Fix, E. and Hodges, J. L. (1951). Discriminatory analysis, nonparametric discrimination: Consistency properties,Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas.
  8. Kuo, W. P., Jenssen, T. K., Butte, A. J., Ohno-Machado, L. and Kohane, I. S. (2002). Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics, 18, 405-412. https://doi.org/10.1093/bioinformatics/18.3.405
  9. Kuner, R. Muley, T. Meister, M. Ruschhaupt, M. Buness, A. Xu, E., Schnabel, P., Warth, A. et al. (2009). Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. Lung Cancer, 63, 32-38. https://doi.org/10.1016/j.lungcan.2008.03.033
  10. Larsen, M., Thomassen, M., Tan, Q., Srensen, K. and Kruse, T. (2014). Microarray-based RNA profiling of breast cancer: Batch effect removal improves cross-platform consistency. BioMed Research International, Article ID 651751.
  11. Lee, S. (2008). Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data. Statistical Methods in Medical Research, 17, 635-642. https://doi.org/10.1177/0962280207084839
  12. Liu, H., Hussain F., Tan C.L. and Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6, 393-423. https://doi.org/10.1023/A:1016304305535
  13. Liu, H., Chen, C., Liu, Y., Chu, C., Liang, D., Shih, L. and Lin, C. (2008). Cross-generation and cross-laboratory predictions of Affymetrix microarrays by rank-based methods. Journal of Biomedical Informatics, 41, 570-579. https://doi.org/10.1016/j.jbi.2007.11.005
  14. Liu, H., Peng, P. C., Hsieh, T. C., Yeh, T., Lin, C., Chen, C. Hou, J., Shih, L. et al . (2014). Comparison of feature selection methods for cross laboratory microarray analysis. BMC Bioinformatics, 15, 274. https://doi.org/10.1186/1471-2105-15-274
  15. Maglott, D., Ostell, J., Pruitt, K.D. and Tatusova, T. (2005). Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research, 33, D54-58. https://doi.org/10.1093/nar/gni052
  16. Newnham, G., Conron, M., McLachlan, S., Dobrovic, A., Do, H., Li, J., Opeskin, K., Thompson, N. et al. (2011). Integrated mutation, copy number and expression profiling in resectable non-small cell lung cancer. BMC Cancer, 7, 11-93.
  17. Nilsson, B., Andersson, A., Johansson, M. and Fioretos, T. (2006). Cross-platform classification in microarray-based leukemia diagnostics. Haematologica, 91, 821-824.
  18. Parry, R. M., Jones, W., Stokes, T. H., Phan, J. H., Moffitt, R. A., Fang, H., Shi, L., Oberthuer, A. et al. (2010). k-nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics Journal, 10, 292-309 https://doi.org/10.1038/tpj.2010.56
  19. Shi L., Campbell, G., Jones, W. D., Campagne, F., Wen, Z., Walker, S. J., Su, Z., Chu, T. et al. (2010). The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology, 28, 827-838 https://doi.org/10.1038/nbt.1665
  20. Shi, L., Reid, L., Jones, W., Shippy, R., Warrington, Baker, S., Collins, P., Francoise de Longueville. et al. (2006). The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology, 24, 1151-1161. https://doi.org/10.1038/nbt1239
  21. Warnat, P., Eils, R. and Brors, B. (2005). Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics, 6, 265. https://doi.org/10.1186/1471-2105-6-265
  22. Williams, PM. Li, R., Johnson, NA., Wright, G., Heath, JD. and Gascoyne, RD. (2010). A novel method of amplification of FFPET-derived RNA enables accurate disease classification with microarrays. Journal of Molecular Diagnosis, 5, 680-686