DOI QR코드

DOI QR Code

Adaptive ridge procedure for L0-penalized weighted support vector machines

  • Received : 2017.09.29
  • Accepted : 2017.11.01
  • Published : 2017.11.30

Abstract

Although the $L_0$-penalty is the most natural choice to identify the sparsity structure of the model, it has not been widely used due to the computational bottleneck. Recently, the adaptive ridge procedure is developed to efficiently approximate a $L_q$-penalized problem to an iterative $L_2$-penalized one. In this article, we proposed to apply the adaptive ridge procedure to solve the $L_0$-penalized weighted support vector machine (WSVM) to facilitate the corresponding optimization. Our numerical investigation shows the advantageous performance of the $L_0$-penalized WSVM compared to the conventional WSVM with $L_2$ penalty for both simulated and real data sets.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea (NRF)

References

  1. Amaldi, E. and Kann, V. (1998). On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209, 237-260. https://doi.org/10.1016/S0304-3975(97)00115-1
  2. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  3. Frommlet, F. and Nuel, G. (2016). An adaptive ridge procedure for l0 regularization. PloS One, 11, e0148620. https://doi.org/10.1371/journal.pone.0148620
  4. Fung, G. M., Mangasarian, O. L. and Smola, A. J. (2002). Minimal kernel classifiers. Journal of Machine Learning Research, 3, 303-321.
  5. Huang, K., King, I. and Lyu, M. R. (2008). Direct zero-norm optimization for feature selection. Eighth IEEE International Conference, 845-850.
  6. Kim, K. H., Shin, S. J., Hwang, C. and Shim, J. (2017). Geographically weighted least squares-support vector machine. Journal of the Korean Data & Information Science Society, 28, 227-235. https://doi.org/10.7465/jkdi.2017.28.1.227
  7. Lin, Y., Lee, Y. and Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46, 191-202. https://doi.org/10.1023/A:1012406528296
  8. Shim, J. and Seok, K. (2014). A transductive least squares support vector machine with the difference convex algorithm. Journal of the Korean Data & Information Science Society, 25, 455-464. https://doi.org/10.7465/jkdi.2014.25.2.455
  9. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 1, 267-288.
  10. Vapnik, V. (2013). The nature of statistical learning theory, Springer Science & Business Media.
  11. Wang, J., X. Shen and Y. Liu (2007). Probability estimation for large-margin classifiers. Biometrika, 95, 149-167.
  12. Weston, J., A. Elisseeff, B. Scholkopf and M. Tipping (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439-1461.
  13. Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729
  14. Zhang, H. H. and W. Lu (2007). Adaptive lasso for coxs proportional hazards model. Biometrika, 94, 691-703. https://doi.org/10.1093/biomet/asm037
  15. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735