DOI QR코드

DOI QR Code

A modified partial least squares regression for the analysis of gene expression data with survival information

  • Lee, So-Yoon (Credit Bureau Business Department, NICE Information Service) ;
  • Huh, Myung-Hoe (Department of Statistics, Korea University) ;
  • Park, Mira (Department of Preventive Medicine, Eulji University)
  • Received : 2014.06.30
  • Accepted : 2014.08.22
  • Published : 2014.09.30

Abstract

In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea (NRF)

References

  1. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Broldrick, J. C., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503-511. https://doi.org/10.1038/35000501
  2. Bovelstad, H. M.,Nygard, S., Storvold, H. L., Aldrin, M., Borgan, O, Frigessi, A. and Lingjaerde, O. C. (2007). Predicting survival from microarray data - A comparative study. Bioinformatics, 23, 2080-2087. https://doi.org/10.1093/bioinformatics/btm305
  3. Dai, J. J., Lieu, L. and Rocke, D. (2006). Dimension reduction for classification with gene expression microarray data. Statistical Applications in Genetics and Molecular Biology, 5, article 6.
  4. Fort, G. and Lambert-Lacroix, S. (2005). Classification using partial least squares with penalized logistic regression. Bioinformatics, 21, 1104-1111. https://doi.org/10.1093/bioinformatics/bti114
  5. Helland, I. (1988). On the structure of partial least squares regression. Communications in Statistics-Simulation and Computation, 17, 581-607. https://doi.org/10.1080/03610918808812681
  6. Kim, J. D. (2003). Unified non-iterative algorithm for principal component regression, partial least squares and ordinary least squares. Journal of the Korean Data & Information Science Society, 14, 355-366.
  7. Mehmood, T., Liland, K., Snipen, L. and SaeboA, S. (2012). A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 118, 62-69. https://doi.org/10.1016/j.chemolab.2012.07.010
  8. Nguyen, D. V. and Rocke, D. M. (2002a). Tumor classification by partial least squares using gene expression data. Bioinformatics, 18, 39-50. https://doi.org/10.1093/bioinformatics/18.1.39
  9. Nguyen, D. V. and Rocke, D. M. (2002b). Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics, 18, 1625-1632. https://doi.org/10.1093/bioinformatics/18.12.1625
  10. Nguyen, T. S. and Rojo, J. (2009). Dimension reduction of microarray gene expression data: The accelerated failure time model. Journal of Bioinformatics and Computational Biology, 7, 939-954. https://doi.org/10.1142/S0219720009004412
  11. Park, P. J., Tian, L. and Kohane, I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics, 18, 120-127. https://doi.org/10.1093/bioinformatics/18.suppl_1.S120
  12. Saeys, Y., Inza, I. and Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507. https://doi.org/10.1093/bioinformatics/btm344
  13. Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., et al. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences, 98, 10869-10874.