DOI QR코드

DOI QR Code

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon (Department of Statistics, Ewha Womans University) ;
  • Yoo, Younju (Department of Statistics, Ewha Womans University) ;
  • Um, Hye Yeon (Department of Statistics, Ewha Womans University) ;
  • Yoo, Jae Keun (Department of Statistics, Ewha Womans University)
  • Received : 2020.03.16
  • Accepted : 2020.04.17
  • Published : 2020.07.31

Abstract

The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Keywords

Acknowledgement

For Chaeyeon Yoo, this work was supported by the BK21 Plus Project through the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (22A20130011003). For Jae Keun Yoo, this work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (NRF-2019R1F1A1050715/2019R1A6A1A11051177).

References

  1. Cook RD and Weisberg S (1991). Comment: Sliced inverse regression for dimension reduction by KC Li, Journal of the American Statistical Association, 86, 328-332.
  2. Hastie T, Tibshirani R, and Friedman J (2008). The Elements of Statistical Learning (2nd ed.), Springer, New York.
  3. Lee K, Choi Y, Um H, and Yoo JK (2019). On fused dimension reduction in multivariate regression, Chemometrics and Intelligent Laboratory Systems, 193, 103828. https://doi.org/10.1016/j.chemolab.2019.103828
  4. Li L, Cook RD, and Nachtsheim CJ (2004). Cluster-based estimation for sufficient dimension reduction, Computational Statistics and Data Analysis, 47, 175-193. https://doi.org/10.1016/j.csda.2003.10.017
  5. Li KC (1991). Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86, 316-327. https://doi.org/10.1080/01621459.1991.10475035
  6. Setodji CM and Cook RD (2004). K-means inverse regression, Technometrics, 46, 421-429. https://doi.org/10.1198/004017004000000437
  7. Yoo JK (2009). Iterative optimal sufficient dimension reduction for the conditional mean in multivariate regression, Journal of Data Science, 7, 267-276.
  8. Yoo JK (2016a). Tutorial: Dimension reduction in regression with a notion of sufficiency, Communications for Statistical Applications and Methods, 23, 93-103. https://doi.org/10.5351/CSAM.2016.23.2.093
  9. Yoo JK (2016b). Tutorial: Methodologies for sufficient dimension reduction in regression. Communications for Statistical Applications and Methods, 23, 95-117.
  10. Yoo JK (2016c). Sufficient dimension reduction through informative predictor subspace. Statistics : A Journal of Theoretical and Applied Statistics, 50, 1086-1099.
  11. Yoo JK (2018). Partial least squares fusing unsupervised learning. Chemometrics and Intelligent Laboratory Systems, 175, 82-86. https://doi.org/10.1016/j.chemolab.2017.12.016
  12. Yoo JK, Lee K, and Woo S (2010). On the extension of sliced average variance estimation to multivariate regression, Statistical Methods and Applications, 19, 529-540. https://doi.org/10.1007/s10260-010-0145-9