Pragmatic Assessment of Optimizers in Deep Learning

Ajeet K. Jain;PVRD Prasad Rao ;K. Venkatesh Sharma ;

doi:10.22937/IJCSNS.2023.23.10.15

International Journal of Computer Science & Network Security

Volume 23 Issue 10
/
Pages.115-128
/
2023
/
1738-7906(pISSN)

International Journal of Computer Science & Network Security (국제컴퓨터통신보호논문지학회)

DOI QR Code

Pragmatic Assessment of Optimizers in Deep Learning

Ajeet K. Jain (Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation) ;
PVRD Prasad Rao (CSE, KLEF) ;
K. Venkatesh Sharma (CSE, CVR College of Engineering)

Received : 2023.10.05
Published : 2023.10.30

https://doi.org/10.22937/IJCSNS.2023.23.10.15 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Deep learning has been incorporating various optimization techniques motivated by new pragmatic optimizing algorithm advancements and their usage has a central role in Machine learning. In recent past, new avatars of various optimizers are being put into practice and their suitability and applicability has been reported on various domains. The resurgence of novelty starts from Stochastic Gradient Descent to convex and non-convex and derivative-free approaches. In the contemporary of these horizons of optimizers, choosing a best-fit or appropriate optimizer is an important consideration in deep learning theme as these working-horse engines determines the final performance predicted by the model. Moreover with increasing number of deep layers tantamount higher complexity with hyper-parameter tuning and consequently need to delve for a befitting optimizer. We empirically examine most popular and widely used optimizers on various data sets and networks-like MNIST and GAN plus others. The pragmatic comparison focuses on their similarities, differences and possibilities of their suitability for a given application. Additionally, the recent optimizer variants are highlighted with their subtlety. The article emphasizes on their critical role and pinpoints buttress options while choosing among them.

Keywords

References

Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning MIT Press, USA, 2016.
Bishop, C.M., Neural Network for Pattern Recognition, Clarendon Press, USA 1995
Francois Chollet, Deep Learning with Python, Manning Pub., 1st Ed, NY, USA, 2018
Ajeet K. Jain, Dr. PVRD Prasad Rao and Dr. K Venkatesh Sharma;"A Perspective Analysis of Regularization and Optimization Techniques in Machine Learning", Computational Analysis and Understanding of Deep Learning or Medical Care: Principles, Methods and Applications". CUDLMC 2020 , Wiley-Scrivener, April/May 2021
John Paul Mueller and Luca Massaron, Deep Learning for Dummies, John Wiley, 2019
Josh Patterson and Adam Gibson, Deep Learning: A Practitioner's Approach, O'Reilly Pub. Indian Edition, 2017
Ajeet K. Jain, Dr.PVRD Prasad Rao , Dr. K. Venkatesh Sharma, Deep Learning with Recursive Neural Network for Temporal Logic Implementation, International Journal of Advanced Trends in Computer Science and Engineering,Volume 9, No.4, July - August 2020, pp 6829-6833. https://doi.org/10.30534/ijatcse/2020/383942020
Srivasatava et al. http://jmlr.org/papers/volume15/srivastava14a.old/srivastava14a.pdf
Dimitri P. Bertsekas, Convex Optimization Theory, Athena Scientific Pub., MIT Press, USA 2009
Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press, USA 2004
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition.Neural Computation, 1(4):541-551 https://doi.org/10.1162/neco.1989.1.4.541
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012
Glorot, X. and Bengio, Y., Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 249-256. (2010)
Glorot, X., Bordes, A., and Bengio, Y, Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 315-323. 2011.
Zeiler, M. and Fergus, R., Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the International Conference on Learning Representations, ICLR, 2013
Prajit Ramachandran, Barret Zoph, Quoc V. Le, SWISH: A Self-Gated Activation Function, arXiv:1710.05941v1 [cs.NE] 16 Oct 2017
Fabian Latorre, Paul Rolland and Volkan Cevher, Lipschitz Constant Estimation Of Neural Networks Via Sparse Polynomial Optimization, ICLR 2020
Kavosh Asadi, Dipendra Misra and Michael L. Littman, Lipschitz Continuity in Model-based Reinforcement Learning, Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580. (2012)
J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, pp 2121-2159, 2011
Prabhu CSR, Gandhi R, Jain A K, Lalka VS, Thottempudi SG, Prasada Rao PVRD; "A Novel Approach to Extend KM Models with Object Knowledge Model (OKM) and Kafka for Big Data and Semantic Web with Greater Semantics", Advances in Intelligent Systems and Computing 993, pp.544, 2020
Bottou, L. Online algorithms and stochastic approximations. In Saad, D., editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, 1998
I. Sutskever, J. Martens, G. Dahl and G. Hinton, On importance of initialization and momentum in deep learning, International Conference on Machine Learning, Atlanta, USA, pp. 1139-1147, 2013
Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k²), Soviet Mathematics Doklady, 27, pp 372-376, 1983
J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, pp 2121-2159, 2011
Ajeet K Jain, Dr.PVRD Prasad Rao and Dr.K Venkatesh Sharma;"Extending Description Logics for Semantic Web Ontology Implementation Domains", Test Engineering and Management 83, pp.7385, 2020
G. Hinton, Neural networks for machine learning, Coursera, video lectures, 2018
D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980, 2014.
S.J. Reddi, S. Kale and S. Kumar, On the convergence of Adam and beyond, International Conference on Learning Representations, Vancouver, Canada, 2018.
Zaheer, M., Reddi, S., Sachan, D., Kale, S., & Kumar, S. Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems (pp. 9793-9803), 2018
Londhe, A., Prasada Rao, P.V.R.D. "Platforms for big data analytics: Trend towards hybrid era" International Conference on Energy, Communication, Data Analytics and Soft Computing, ICECDS 2017 DOI: 10.1109/ICECDS.2017.8390056
Hiroaki Hayashi, Jayanth Koushik and Graham Neubig ; Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates , arXiv:1611.01505v3 [cs.LG] 11 Jun 2018
Liyuan Liu, et al., On The Variance Of The Adaptive Learning Rate And Beyond, arXiv:1908.03265v3 [ cs.LG] 17 Apr 2020
https://d2l.ai/chapter_optimization/lr-scheduler.html
Nicola Landro, Ignazio Gallo, Riccardo La Grassa, Mixing ADAM and SGD: a Combined Optimization Method, arXiv:2011.08042v1 [cs.LG] 16 Nov 2020
Jonathan Frankle and Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, arXiv:1803.03635v5 [cs.LG] 4 Mar 2019
Yadla, H.K., Rao, P.V.R.D.P. "Machine learning based text classifier centered on TF-IDF vectoriser, International Journal of Scientific and Technology Research, 2020
Varakumari, S., Prasad Rao, P.V.R.D., Sirisha, M., Mohan Rao, K.R.R. MANOVA- A multivariate statistical variance analysis for WSN using PCA 2018 International Journal of Engineering and Technology(UAE) 7, 2018
Phani Madhuri, N., Meghana, A., Prasada Rao, P.V.R.D., Prem Kumar, P."Ailment prognosis and propose antidote for skin using deep learning", International Journal of Innovative Technology and Exploring Engineering, 2019.