References
- D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim. 7 (1997), no. 4, 913-926. https://doi.org/10.1137/S1052623495287022
- D. P. Bertsekas, Nonlinear Programming, 2, Athena Scientific, Belmont, MA, 1999.
- D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, 1989.
- D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), no. 1, 29-51. https://doi.org/10.1137/040615961
- P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput. 11 (1999), no. 3, 217-238. https://doi.org/10.1287/ijoc.11.3.217
- S. Chen, D. Donoho, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61. https://doi.org/10.1137/S1064827596304010
- N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.
- I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413-1457. https://doi.org/10.1002/cpa.20042
- J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized lienar models via coordinate descent, Report, Department of Statistics, Stanford University, Stanford, May 2009.
- A. A. Gaivoronski, Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I, Optim. Methods Softw. 4 (1994), 117-134. https://doi.org/10.1080/10556789408805582
- L. Grippo, A class of unconstrained minimization methods for neural network training, Optim. Methods Softw. 4 (1994), 135-150. https://doi.org/10.1080/10556789408805583
- C.-H. Ho and C.-J. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012), 3323-3348.
- A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2009), 1574-1609. https://doi.org/10.1137/070704277
- K. Koh, S.-J. Kim, and S. Boyd, An interior-point method for large-scale ℓ1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007), 1519-1555.
-
S. Lee, H. Lee, P. Abeel, and A. Ng, Efficient
${\ell}1$ -regularized logistic regression, In Proceedings of the 21st National Conference on Artificial Intelligence, 2006. - Z.-Q. Luo and P. Tseng, Analysis of an approximate gradient projection method with applications to the backpropagation algorithm, Optim. Methods Softw. 4 (1994), 85-101. https://doi.org/10.1080/10556789408805580
- O. L. Mangasarian and D. R. Musicant, Large scale kernel regression via linear pro-gramming, Mach. Learn. 46 (2002), 255-269. https://doi.org/10.1023/A:1012422931930
- O. L. Mangasarian and M. V. Solodov, Serial and parallel backpropagation convergence via nonmonotone perturbed minimization, Optim. Methods Softw. 4 (1994), 103-116. https://doi.org/10.1080/10556789408805581
- Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221-259. https://doi.org/10.1007/s10107-007-0149-x
- R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing-Explorations in the Microstructure of Cognition, edited by Rumelhart and McClelland, 318-362, MIT press, Cambridge, 1986.
- S. Sardy and P. Tseng, AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets, J. Comput. Graph. Statist. 13 (2004), no. 2, 283-309. https://doi.org/10.1198/1061860043434
- R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267-288.
- P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algorithm, SIAM J. Optim. 1 (1991), no. 4, 603-619. https://doi.org/10.1137/0801036
- P. Tseng, and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, 387-423. https://doi.org/10.1007/s10107-007-0170-0
- P. Tseng, and S. Yun, Incrementally updated gradient methods for constrained and regularized opti-mization, J. Optim. Theory Appl. 160 (2014), no. 3, 832-853. https://doi.org/10.1007/s10957-013-0409-2
- V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 2000.
- L. Wang, Efficient regularized solution path algorithms with applications in machine learning and data mining, Ph.D thesis, University of Michigan, 2008.
- H. White, Learning in artificial neural networks: a statistical perspective, Neural Com-put. 1 (1989), 425-464. https://doi.org/10.1162/neco.1989.1.4.425
- H. White, Some asymptotic results for learning in single hidden-layer feedforward network models, J. Amer. Statist. Assoc. 84 (1989), no. 408, 1003-1013. https://doi.org/10.1080/01621459.1989.10478865
- L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), 2543-2596.