DOI QR코드

DOI QR Code

Model-based inverse regression for mixture data

  • Received : 2016.11.14
  • Accepted : 2016.12.13
  • Published : 2017.01.31

Abstract

This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

Keywords

References

  1. Anderson TW (1963). Asymptotic theory for principal component analysis, Annals of Mathematical Statistics, 34, 122-148. https://doi.org/10.1214/aoms/1177704248
  2. Anderson TW and Rubin H (1956). Statistical inference in factor analysis. In Proceedings of 3rd Berkeley Symposium on Mathematical Statistics and Probability, 5, University of California Press, 111-150.
  3. Chen J, Li P, and Fu Y (2012). Inference on the order of a normal mixture, Journal of the American Statistical Association, 107, 1096-1105. https://doi.org/10.1080/01621459.2012.695668
  4. Cook RD (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In Proceedings of the Section on Physical and Engineering Sciences (pp. 18-25), American Statistical Association, Alexandria, VA.
  5. Cook RD (1998). Regression Graphics: Ideas for Studying Regressions through Graphics, JohnWiley & Sons, New York.
  6. Cook RD and Weisberg S (1991). Comment on sliced inverse regression by K. C. Li, Journal of the American Statistical Association, 86, 328-332.
  7. Cook RD and Weisberg S (1994). An Introduction to Regression Graphics, John Wiley & Sons, New York.
  8. Gentle JE (2007). Matrix Algebra: Theory, Computations, and Applications in Statistics, Springer, New York.
  9. Jeffries NO (2003). A note on 'Testing the number of components in a normal mixture', Biometrika, 90, 991-994. https://doi.org/10.1093/biomet/90.4.991
  10. Jolliffe IT (2002). Principal Component Analysis(2nd ed), Springer, New York.
  11. Lawley DN (1953). A modified method of estimation in factor analysis and some large sample results. In Uppsala Symposium on Psychological Factor Analysis (pp. 35-42), Munksgaards, Copenhagen.
  12. Li B, Zha H, and Chiaromonte F (2005). Contour regression: a general approach to dimension reduction, Annals of Statistics, 33, 1580-616. https://doi.org/10.1214/009053605000000192
  13. Li KC (1992). On principal Hessian directions for data visualization and dimension reduction: another application of Stein's lemma, Journal of the American Statistical Association, 87, 1025-1039. https://doi.org/10.1080/01621459.1992.10476258
  14. Li KC (1991). Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86, 316-327. https://doi.org/10.1080/01621459.1991.10475035
  15. Lo Y, Mendell NR, and Rubin DB (2001). Testing the number of components in a normal mixture, Biometrika, 88, 767-778. https://doi.org/10.1093/biomet/88.3.767
  16. Lo Y (2005). Likelihood ratio tests of the number of components in a normal mixture with unequal variances, Statistics & Probability Letters, 71, 225-235. https://doi.org/10.1016/j.spl.2004.11.007
  17. Meyer CD (2000). Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA.
  18. Paisley J and Carin L (2009). Nonparametric factor analysis with beta process priors. In Proceedings of the 26th Annual International Conference on Machine Learning(pp. 777-784), ACM, New York.
  19. Scrucca L (2011). Model-based SIR for dimension reduction, Computational Statistics & Data Analysis, 55, 3010-3026. https://doi.org/10.1016/j.csda.2011.05.006
  20. Seo B and Kim D (2012). Root selection in normal mixture models, Computational Statistics & Data Analysis, 56, 2454-2470. https://doi.org/10.1016/j.csda.2012.01.022
  21. Tipping ME and Bishop CM (1999a). Probabilistic principal component analysis, Journal of the Royal Statistical Society Series B (Statistical Methodology), 61, 611-622. https://doi.org/10.1111/1467-9868.00196
  22. Tipping ME and Bishop CM (1999b). Mixtures of probabilistic principal component analyzers, Neural Computation, 11, 443-482. https://doi.org/10.1162/089976699300016728
  23. Vidal R (2011). Subspace clustering, IEEE Signal Processing Magazine, 28, 52-68. https://doi.org/10.1109/MSP.2010.939739
  24. Whittle P (1952). On principal components and least square methods of factor analysis, Scandinavian Actuarial Journal, 36, 223-239.
  25. Young G (1941). Maximum likelihood estimation and factor analysis, Psychometrika, 6, 49-53. https://doi.org/10.1007/BF02288574