DOI QR코드

DOI QR Code

순차적 부분최소제곱 회귀적합에 의한 시간경로 유전자 발현 자료의 결측치 추정

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting

  • 김경숙 (전남대학교 통계학과) ;
  • 오미라 (광주과학기술원 정보통신공학과) ;
  • 백장선 (전남대학교 통계학과) ;
  • 손영숙 (전남대학교 통계학과)
  • Kim, Kyung-Sook (Dept. of Statistics, Chonnam National University) ;
  • Oh, Mi-Ra (Dept. of Information and Communications, Gwangju Institute of Science Technology) ;
  • Baek, Jang-Sun (Dept. of Statistics, Chonnam National University) ;
  • Son, Young-Sook (Dept. of Statistics, Chonnam National University)
  • 발행 : 2008.04.30

초록

마이크로어레이 유전자 발현 자료는 대용량이며 또한 관측 과정이 복잡하여 결측치가 빈번하게 발생된다. 본 논문에서는 관측 시점 간에 상관성을 갖는 시간경로 유전자 발현 자료에 대한 결측치 추정을 위하여 순차적 부분최소제곱(sequential partial least squares: SPLS) 회귀적합 방법을 제안한다. 이는 순차적 기법과 부분최소제곱(partial least squares: PLS) 회귀적합 방법을 결합시킨 것이다. 세 가지의 이스트(yeast) 시간경로 자료들에 대한 몇 가지 모의실험을 통하여 제안된 결측치 추정방법의 유용성을 평가한다.

The size of microarray gene expression data is very big and its observation process is also very complex. Thus missing values are frequently occurred. In this paper we propose the sequential partial least squares(SPLS) regression fitting method to estimate missing values for time course gene expression data that has correlations among observations over time points. The SPLS method is to combine the sequential technique with the partial least squares(PLS) regression fitting method. The usefulness of method proposed is evaluated through some simulation study for three yeast time course data.

키워드

참고문헌

  1. Abdi, H. (2003). Partial least squares regression (PLS-regression), In M. Lewis-Beck, A. Bryman, T. Futing (Eds): Encyclopedia for research methods for the social sciences, Thousand Oaks (CA): Sage, 792-795
  2. B/o, T. H., Dysvik, B. and Jonassen, I. (2004). LSimpute: Accurate estimation of missing values in microarray data with least squares methods, Nucleic Acids Research, 32, e34 https://doi.org/10.1093/nar/gnh026
  3. de Brevern, A. G., Hazout, S. and Malpertuy, A. (2004). Influence of microarrays experi- ments missing values on the stability of gene groups by hierarchical clustering, BMC Bioinformatics, 5, 114
  4. DeRisi, J. L., Iyer, V. R. and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686 https://doi.org/10.1126/science.278.5338.680
  5. Garthwaite, P. H. (1994). An interpretation of partial least squares, Journal of the American Statistical Association, 89, 122-127 https://doi.org/10.2307/2291207
  6. Hastie, T., Alter, O., Sherlock, G., Eisen, M., Tibshirani, R., Bostein, D. and Brown, P. (1999). Imputation of missing values in DNA microarrays, Technical Report Stanford University Statistics Department
  7. Hoskuldsson, A. (1988). PLS regression methods, Journal of Chemometrics, 2, 211-228 https://doi.org/10.1002/cem.1180020306
  8. Jorgensen, B. and Goegebeur, Y. (2006). Module 8: Partial least squares regressions II, STO2: Multivariate Data Analysis and Chemometrics, http://statmaster.sdu.dk/cour-ses/ST02
  9. Kim, H., Golub, G. H. and Park, H. (2005). Missing value estimation for DNA microarray gene expression data: Local least squares imputation, Bioinformatics, 21, 187-198 https://doi.org/10.1093/bioinformatics/bth499
  10. Kim, K. Y., Kim, B. J. and Yi, G. S. (2004). Reuse of imputed data in microarray analysis increases imputation efficiency, BMC Bioinformatics, 5, 160 https://doi.org/10.1186/1471-2105-5-160
  11. Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50 https://doi.org/10.1093/bioinformatics/18.1.39
  12. Nguyen, D., Wang, N. and Carroll, R. J. (2004). Missing value estimation for cancer microarray gene. expression data, Journal of Data Science, 2, 347-370
  13. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K. and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data, Bioinfor-matics, 19, 2088-2096 https://doi.org/10.1093/bioinformatics/btg287
  14. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell, 9, 3273-3297 https://doi.org/10.1091/mbc.9.12.3273
  15. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525 https://doi.org/10.1093/bioinformatics/17.6.520