DOI QR코드

DOI QR Code

Binary Segmentation Procedure for Detecting Change Points in a DNA Sequence

  • 발행 : 2005.04.01

초록

It is interesting to locate homogeneous segments within a DNA sequence. Suppose that the DNA sequence has segments within which the observations follow the same residue frequency distribution, and between which observations have different distributions. In this setting, change points correspond to the end points of these segments. This article explores the use of a binary segmentation procedure in detecting the change points in the DNA sequence. The change points are determined using a sequence of nested hypothesis tests of whether a change point exists. At each test, we compare no change-point model with a single change-point model by using the Bayesian information criterion. Thus, the method circumvents the computational complexity one would normally face in problems with an unknown number of change points. We illustrate the procedure by analyzing the genome of the bacteriophage lambda.

키워드

참고문헌

  1. Akaike, H. (1973). Information measures and model selection, Bulletin of the International Statistical Institute, Vol. 50, 277-290
  2. Braun, J.V. and Muller, H. (1998). Statistical methods for DNA sequence segmentation, Statistical Science, Vol. 13, 142-162 https://doi.org/10.1214/ss/1028905933
  3. Braun, J,V., Braun, P.K. and Muller, H. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation, Biometrika, Vol 87, 301-314 https://doi.org/10.1093/biomet/87.2.301
  4. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees, Wadworth and Brooks/Cole, Monterey
  5. Chen, J. and Gupta, A. (1997). Testing and locating variance change points with applications to stock prices, Journal of the American Statistical Association, Vol. 92, 739-747 https://doi.org/10.2307/2965722
  6. Holm, S. (1979). A simple sequentially rejective Bonferroni test procedure, Scandinavian Journal of Statistics. Vol. 6, 65-70
  7. Kadane, J.B. and Lazar, N.A. (2004). Methods and criteria for model selection, Journal of the American Statistical Society, Vol. 99 279-290 https://doi.org/10.1198/016214504000000269
  8. Kass, R.E. and Raftery, A.E. (1995). Bayes factor, Journal of the American Statistical Association, Vol. 90, 773-795 https://doi.org/10.2307/2291091
  9. Kim, H. and Mallick, B.K. (2002). Analyzing spatial data using skew-Gaussian processes, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 163-173
  10. Liu, J.S. and Lawrence, C.E. (1999). Bayesian inference on bipolymer models, Bioinformatics, Vol. 15, 38-52 https://doi.org/10.1093/bioinformatics/15.1.38
  11. Raftery, A. (1995). Bayesian model selection in social research, In Sociological Methodology, Marsden P(ed). Blackwells, Cambridge, 111-196
  12. Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals, Journal of the American Statistical Association, Vol. 92, 894-902 https://doi.org/10.2307/2965553
  13. Schlattmann, P., Gallinat, J. and Bohning, D. (2002). Spatia-temporal partition modelling: an example from neurophysiology, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 227-234
  14. Schwarz, G. (1978). Estimating the dimension of a model, The Annals of Statistics, Vol. 6, 461-464 https://doi.org/10.1214/aos/1176344136
  15. Scott, A. and Knott, M. (1974). Cluster analysis method for grouping means in the analysis of variance, Biometrics, Vol. 30, 507-512 https://doi.org/10.2307/2529204
  16. Skalka, A. Burge, E. and Hershey, A.D. (1968). Segmental distribution of nucleotides in the DNA of bacteriophage lambda, Journal of Molecular Biology, Vol. 34, 1-16 https://doi.org/10.1016/0022-2836(68)90230-1
  17. Titterington, D.M., Smith, A.F.M. and Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions, Wiley, New York
  18. van Dyk, D.A. and Hans, C.M. (2002). Accounting for absorption lines in images obtained with the Chandra X-ray Observatory, In Spatial Cluster Modelling, A. Lawson and D. Denison (editors). Chapman and Hall, London, 175-198
  19. Venkatraman, E.S. (1992). Consistency results in multiple change-point situations, Unpublished PhD Thesis, Department of Statistics, Stanford University
  20. Vostrikova, L.J, (1981). Detecting 'disorder' in multidimensional random processes, Soviet Mathematics Doklady, Vol. 24, 55-59
  21. Yang, T.Y. and Kuo, L. (2001). Bayesian binary segmentation procedure for a Poisson process with multiple changepoints, Journal of Computational and Graphical Statistics, Vol. 10, 772-785 https://doi.org/10.1198/106186001317243449
  22. Yang, T.Y. (2004). Bayesian binary segmentation procedure for detecting streakiness in sports, Journal of the Royal Statistical Society Series A, Vol. 167, 627-637 https://doi.org/10.1111/j.1467-985X.2004.00484.x
  23. Yang, T.Y. (2005). A tree-based model for homogeneous groupings of multinomials, Statistics in Medicine, in press
  24. Yang, T.Y. and Swartz, T. (2005). Applications of binary segmentation to the estimation of quantal response curves and spatial intensity. Biometrical Journal, in press