DOI QR코드

DOI QR Code

A concise overview of principal support vector machines and its generalization

  • Received : 2024.01.19
  • Accepted : 2024.02.20
  • Published : 2024.03.31

Abstract

In high-dimensional data analysis, sufficient dimension reduction (SDR) has been considered as an attractive tool for reducing the dimensionality of predictors while preserving regression information. The principal support vector machine (PSVM) (Li et al., 2011) offers a unified approach for both linear and nonlinear SDR. This article comprehensively explores a variety of SDR methods based on the PSVM, which we call principal machines (PM) for SDR. The PM achieves SDR by solving a sequence of convex optimizations akin to popular supervised learning methods, such as the support vector machine, logistic regression, and quantile regression, to name a few. This makes the PM straightforward to handle and extend in both theoretical and computational aspects, as we will see throughout this article.

Keywords

Acknowledgement

This work is funded by the National Research Foundation of Korea (NRF) grants (2023R1A2C1006587, 2022M3J6A1063595) and Korea University (K2302021).

References

  1. Akaho S (2001). A kernel method for canonical correlation analysis, Available from: arXiv preprint cs/0609071
  2. Artemiou A and Dong Y (2016). Sufficient dimension reduction via principal L q support vector machine, Electronic Journal of Statistics, 10, 783-805.
  3. Artemiou A, Dong Y, and Shin SJ (2021). Real-time sufficient dimension reduction through principal least squares support vector machines, Pattern Recognition, 112, 107768.
  4. Banijamali E, Karimi A-H, and Ghodsi A (2018). Deep variational sufficient dimensionality reduction, Available from: arXiv preprint arXiv:1812.07641
  5. Bickel PJ and Levina E (2008). Regularized estimation of large covariance matrices, The Annals of Statistics, 36, 199-227.
  6. Bondell HD and Li L (2009). Shrinkage inverse regression estimation for model-free variable selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 287-299.
  7. Boyd SP and Vandenberghe L (2004). Convex Optimization, Cambridge University Press, Cambridge.
  8. Bura E, Forzani L, Arancibia RG, Llop P, and Tomassi D (2022). Sufficient reductions in regression with mixed predictors, The Journal of Machine Learning Research, 23, 4377-4423.
  9. Chun H and Keles, S (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society Series B: Statistical Methodology, 72, 3-25.
  10. Cook RD (2004). Testing predictor contributions in sufficient dimension reduction, The Annals of Statistics, 32, 1062-1092.
  11. Cook RD (2007). Fisher lecture: Dimension reduction in regression, Statistical Science, 22, 1-26.
  12. Cook RD and Weisberg S (1991). Discussion of "sliced inverse regression for dimension reduction", Journal of the American Statistical Association, 86, 28-33.
  13. Fan J and Li R (2001). Variable section via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360.
  14. Forero PA, Cano A, and Giannakis GB (2010). Consensus-based distributed linear support vector machines, In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, Stockholm, 35-46.
  15. Fukumizu K, Bach FR, and Gretton A (2007). Statistical consistency of kernel canonical correlation analysis, Journal of Machine Learning Research, 8, 361-383.
  16. Hastie T, Tibshirani R, Friedman J, and Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York.
  17. Hristache M, Juditsky A, Polzehl J, and Spokoiny V (2001). Structure adaptive approach for dimension reduction, Annals of Statistics, 29, 1537-1566.
  18. Jang HJ, Shin SJ, and Artemiou A (2023). Principal weighted least square support vector machine: An online dimension-reduction tool for binary classification, Computational Statistics & Data Analysis, 187, 107818.
  19. Jiang B, Zhang X, and Cai T (2008). Estimating the confidence interval for prediction errors of support vector machine classifiers, Journal of Machine Learning Research, 9, 521-540.
  20. Jin J, Ying C, and Yu Z (2019). Distributed estimation of principal support vector machines for sufficient dimension reduction, Available from: arXiv preprint arXiv:1911.12732
  21. Kang J and Shin SJ (2022). A forward approach for sufficient dimension reduction in binary classification, The Journal of Machine Learning Research, 23, 9025-9055.
  22. Kapla D, Fertl L, and Bura E (2022). Fusing sufficient dimension reduction with neural networks, Computational Statistics & Data Analysis, 168, 107390.
  23. Kim B and Shin SJ (2019). Principal weighted logistic regression for sufficient dimension reduction in binary classification, Journal of the Korean Statistical Society, 48, 194-206.
  24. Kim H, Howland P, Park H, and Christianini N (2005). Dimension reduction in text classification with support vector machines, Journal of Machine Learning Research, 6, 37-53.
  25. Kim K, Li B, Zhou Y, and Li L (2020). On post dimension reduction statistical inference, The Annals of Statistics, 48, 1567-1592.
  26. Koenker R and Bassett G (1978). Regression quantiles, Econometrica, 46, 33-50.
  27. Kong E and Xia Y (2014). An adaptive composite quantile approach to dimension reduction, The Annals of Statistics, 42, 1657-1688.
  28. Lee KY, Li B, and Chiaromonte F (2013). A general theory for nonlinear sufficient dimension reduction: Formulation and estimation, The Annals of Statistics, 41, 221-249.
  29. Li B (2018). Sufficient Dimension Reduction: Methods and Applications with R, CRC Press, Boca Raton, FL.
  30. Li B, Artemiou A, and Li L (2011). Principal support vector machines for linear and nonlinear sufficient dimension reduction, The Annals of Statistics, 39, 3182-3210.
  31. Li B and Wang S (2007). On directional regression for dimension reduction, Journal of the American Statistical Association, 102, 997-1008.
  32. Li B, Zha H, and Chiaromonte F (2005). Contour regression: A general approach to dimension reduction, The Annals of Statistics, 33, 1580-1616.
  33. Li K-C (1991). Sliced inverse regression for dimension reduction (with discussion), Journal of the American Statistical Association, 86, 316-342.
  34. Li K-C (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma, Journal of the American Statistical Association, 87, 1025-1039.
  35. Li L (2007). Sparse sufficient dimension reduction, Biometrika, 94, 603-613.
  36. Li L (2010). Dimension reduction for high-dimensional data, Statistical Methods in Molecular Biology, 620, 417-434.
  37. Pearson K (1901). On lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559-572.
  38. Power MD and Dong Y (2021). Bayesian model averaging sliced inverse regression, Statistics & Probability Letters, 174, 109103.
  39. Quach H and Li B (2023). On forward sufficient dimension reduction for categorical and ordinal responses, Electronic Journal of Statistics, 17, 980-1006.
  40. Reich BJ, Bondell HD, and Li L (2011). Sufficient dimension reduction via Bayesian mixture modeling, Biometrics, 67, 886-895.
  41. Shin SJ and Artemiou A (2017). Penalized principal logistic regression for sparse sufficient dimension reduction, Computational Statistics & Data Analysis, 111, 48-58.
  42. Shin SJ, Wu Y, Zhang HH, and Liu Y (2017). Principal weighted support vector machines for sufficient dimension reduction in binary classification, Biometrika, 104, 67-81.
  43. Soale A-N and Dong Y (2022). On sufficient dimension reduction via principal asymmetric least squares, Journal of Nonparametric Statistics, 34, 77-94.
  44. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of Royal Statistical Society, series B, 58, 267-288.
  45. Van der Vaart AW (2000). Asymptotic Statistics, Cambridge University Press, Cambridge.
  46. Vapnik V (1999). The Nature of Statistical Learning Theory, Springer Science & Business Media, New York.
  47. Wahba G (1999). Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Advances in Kernel Methods-Support Vector Learning, 6, 69-87.
  48. Wang C, Shin SJ, and Wu Y (2018). Principal quantile regression for sufficient dimension reduction with heteroscedasticity, Electronic Journal of Statistics, 12, 2114-2140.
  49. Weng J and Young DS (2017). Some dimension reduction strategies for the analysis of survey data, Journal of Big Data, 4, 1-19.
  50. Wu H-M (2008). Kernel sliced inverse regression with applications to classification, Journal of Computational and Graphical Statistics, 17, 590-610.
  51. Wu Y and Li L (2011). Asymptotic properties of sufficient dimension reduction with a diverging number of predictors, Statistica Sinica, 2011, 707.
  52. Xia Y (2007). A constructive approach to the estimation of dimension reduction directions, The Annals of Statistics, 35, 2654-2690.
  53. Xia Y, Tong H, Li WK, and Zhu L (2002). An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 363-410.
  54. Yin X and Hilafu H (2015). Sequential sufficient dimension reduction for large p, small n problems, Journal of the Royal Statistical Society Series B: Statistical Methodology, 77, 879-892.
  55. Yin X, Li B, and Cook RD (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression, Journal of Multivariate Analysis, 99, 1733-1757.
  56. Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B: Statistical Methodology, 68, 49-67.
  57. Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942.
  58. Zhu L-P, Zhu L-X, and Feng Z-H (2010). Dimension reduction in regressions through cumulative slicing estimation, Journal of the American Statistical Association, 105, 1455-1466.
  59. Zou H (2006). The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429.
  60. Zou H and Hastie T (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301-320.