Detection of Neural Fates from Random Differentiation : Application of Support Vector MachineMin

  • Lee, Min-Su (Department of Computer Science and Engineering, Ewha Womans University) ;
  • Ahn, Jeong-Hyuck (Laboratory of Molecular and Cellular Neuroscience, Rockefeller University) ;
  • Park, Woong-Yang (Human Genome Research Institute and Department of Biochemistry, Seoul National University College of Medicine)
  • Published : 2007.03.31

Abstract

Embryonic stem cells can be differentiated into various types of cells, requiring a tight regulation of transcription. Biomarkers related to each lineage of cells are used to guide the differentiation into neural or any other fates. In previous experiments, we reported the guided differentiation (GD)-specific genes by comparing profiles of random differentiation (RD). Interestingly 68% of differentially expressed genes in GD overlap with that of RD, which makes it difficult for us to separate the lineages by examining several markers. In this paper, we design a prediction model to identify the differentiation into neural fates from any other lineage. From the profiles of 11,376 genes, 203 differentially expressed genes between neural and random differentiation were selected by random variance T-test with 95% confidence and 5% false discovery rate. Based on support vector machine algorithm, we could select 79 marker genes from the 203 informative genes to construct the optimal prediction model. Here we propose a prediction model for the prediction of neural fates from random differentiation which is constructed with a perfect accuracy.

Keywords

References

  1. Sasai, Y. (1998). Identifying the missing links: genes that connect neural induction and primary neurogenesis in vertebrate embryos. Neuron. 21, 455-458 https://doi.org/10.1016/S0896-6273(00)80554-1
  2. Czyz, J. and Wobus A. (2001). Embryonic stem cell di fferentiation: the role of extracellular factors. Differen-tiation. 68, 167-174 https://doi.org/10.1046/j.1432-0436.2001.680404.x
  3. Lee, M.S., Jun, D.H., Hwang, C.I., Park, S.S., Kang, J.J., Park, H.S., Kim, J., Kim, J.H., See, J.S., and Park, W.Y. (2006). Selection of neural differentiation-specific genes by comparing profiles of random differentiation. Stem Cells 24, 1946-1955 https://doi.org/10.1634/stemcells.2005-0325
  4. Lee, S.H., Lumelsky, N., Studer, L., Auerbach, J.M., and McKay, R.D. (2000). Efficient generation of midbrain and hindbrain neurons from mouse embryonic stem cells. Nat. Biotechnol. 18, 675-679 https://doi.org/10.1038/76536
  5. Park, W.Y., Hwang, C.I., Im, C.N., Kang, M.J., Woo, J.H., Kim, J.H., Kim, Y.S., Kim, J.H., Kim, H., Kim, K.A., Yu, H.J., Lee, S.J., Lee, Y.S., and Seo, J.S. (2002). Identification of radiation-specific responses from gene expression profile. Oncogene 21, 8521-8528 https://doi.org/10.1038/sj.onc.1205977
  6. Kim, J.H., Ha, I.S., Hwang C.I., Lee, Y.J., Kim, J., Yang, S.H., Kim, Y.S., Cao, Y.A., Choi, S., and Park, W.Y. (2004). Gene expression profiling of anti-GBM glomerulonephritis model: the role of NF-kappaB in immune complex kidney disease. Kidney Int. 66, 1826-1837 https://doi.org/10.1111/j.1523-1755.2004.00956.x
  7. Huber, W., von Heydebreck, A., Sultmann, H., Poustkam, A., and Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformetics 18(Suppl 1), S95-S104
  8. Yang, Y.H., Dudoit, S., Luu, P., Lin, D.M., Peng V., Ngai, J., and Speed, T.P. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15 https://doi.org/10.1093/nar/30.4.e15
  9. Tan, P.N., Stenbach, M., and Kumar, V. (2005). Introduction to Data Mining, Addison Wesley
  10. Vapnik, V.N. (1998). Statistical Leaming Theory, Wiley, New York, NY
  11. Christianini, N. and Shawe-Taylor, J. (2002). An introduction to support vector machines and other kernel-based learning methods, Cambridge University Press
  12. Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization, Advances in kernel methods-support vector learning. MIT Press, Boston
  13. Keerthi, S.S., Shevade, S.K., Bahattacharyya, C., and Murthy, K.R.K. (2001). Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation. 13, 637-649 https://doi.org/10.1162/089976601300014493
  14. Witten, I.H. and Frank E. (2005). Data mining: Practical machine learning tools and techniques, 2nd ed. Morgan Kaufmann
  15. Wilk, M.B. and Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika 55:1-17
  16. Korn, E.L., Troendle, J.F., McShane, L.M., and Simon, R. (2004). Controlling the number of false discoveries: applications to high-dimensional genomic data. J. Statist. Planng Inf. 379-398 https://doi.org/10.1016/S0378-3758(03)00211-8
  17. Wright, G.W. and Simon R. (2003). A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics. 19, 2448-2455 https://doi.org/10.1093/bioinformatics/btg345
  18. The Gene Ontology Consortium. (2000). Gene Ontology: tool for the unification of biology. Nat Genet. 25 https://doi.org/10.1038/75556