References
- D. Yi, S. Ji & S. Bu. (2019). An Enhanced Optimization Scheme Based on Gradient Descent Methods for Machine Learning. symmetry, 11 (7), 942-959. https://doi.org/10.3390/sym11070942
- H. Zulkifli. (2018). Understanding Learning Rates and How It Improves Performance in Deep Learning. [Online] https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10?gi=e082fbb7c7a9.
- S. Lau. (2017). Learning Rate Schedules and Adaptive Learning Rate Methods for Deep Learning. Towards Data Science. [Online] https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1.
- G. Aurelien. (2017). Gradient Descent . Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly. pp. 113-124. ISBN 978-1-4919-6229-9.
- J. Duchi, E. Hazan & Y. Singer. (2011). Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res, 12 , 2121-2159.
- Y. LeCun, L. Bottou, Y. Bengio & P. Haffner. (1998). Gradient-based learning applied to document recognition. Proc. IEEE, 86 , 2278-2324. https://doi.org/10.1109/5.726791
- R. Pascanu & Y. Bengio. (2013). Revisiting natural gradient for deep networks. arXiv:1301.3584.
- J. Sohl-Dickstein, B. Poole & S. Ganguli. (2014). Fast large-scale optimization by unifying stochastic gradient and quasi-newton methods. In Proceedings of the 31st International Conference on Machine Learning. (pp. 604-612). Beijing, China.
- P. Baldi & K. Hornik. (1989). Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2(1), 53-58. https://doi.org/10.1016/0893-6080(89)90014-2
- M. Zinkevich. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning. (pp. 928-936). Washington, DC, USA.
- C. T. Kelley. (1995). Iterative methods for linear and nonlinear equations(Volume 16). In Frontiers in Applied Mathematics; SIAM: Philadelphia, PA, USA.
- I. Sutskever, J. Martens, G. Dahl & G. E. Hinton. (2013). On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning. (pp. 1139-1147). Atlanta, GA, USA.
- M. D. Zeiler. (2012). Adadelta: An adaptive learning rate method. arXiv:1212.5701.
- D. P. Kingma & J. L. Ba. (2015). Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations. (pp. 7-9). San Diego, CA, USA.
- M. J. Kochenderfer & T. A. Wheeler. (2019). Algorithms for Optimization. London : The MIT Press Cambridge.