DOI QR코드

DOI QR Code

GRADIENT EXPLOSION FREE ALGORITHM FOR TRAINING RECURRENT NEURAL NETWORKS

  • HONG, SEOYOUNG (DEPARTMENT OF MATHEMATICS, EWHA WOMANS UNIVERSITY) ;
  • JEON, HYERIN (DEPARTMENT OF MATHEMATICS, EWHA WOMANS UNIVERSITY) ;
  • LEE, BYUNGJOON (DEPARTMENT OF MATHEMATICS, THE CATHOLIC UNIVERSITY OF KOREA) ;
  • MIN, CHOHONG (DEPARTMENT OF MATHEMATICS, EWHA WOMANS UNIVERSITY)
  • Received : 2020.08.31
  • Accepted : 2020.11.08
  • Published : 2020.12.25

Abstract

Exploding gradient is a widely known problem in training recurrent neural networks. The explosion problem has often been coped with cutting off the gradient norm by some fixed value. However, this strategy, commonly referred to norm clipping, is an ad hoc approach to attenuate the explosion. In this research, we opt to view the problem from a different perspective, the discrete-time optimal control with infinite horizon for a better understanding of the problem. Through this perspective, we fathom the region at which gradient explosion occurs. Based on the analysis, we introduce a gradient-explosion-free algorithm that keeps the training process away from the region. Numerical tests show that this algorithm is at least three times faster than the clipping strategy.

Keywords

Acknowledgement

The research of Chohong Min was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (Grant No. 2019R1A6A1A11051177). The research of Byungjoon Lee was supported by POSCO Science Fellowship of POSCO TJ Park Foundation and NRF grant 2020R1A2C4002378.

References

  1. Y. Goldberg and G. Hirst. Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, 2017.
  2. T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur. Extensions of recurrent neural network language model. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5528-5531, 2011.
  3. K. Yao, G. Zweig, M.Y. Hwang, Y. Shi, and D. Yu. Recurrent neural networks for language understanding. pages 2524-2528. Interspeech, 2013.
  4. I. Sutskever, J. Martens, and G.E. Hinton. Generating text with recurrent neural networks. In International Conference on Machine Learning (ICML), pages 1017-1024, 2011.
  5. A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6645-6649, 2013.
  6. H. Sak, A. Senior, and F. Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. Computer Research Repository (CoRR), pages 338-342, 2014.
  7. S. Liu, N. Yang, M. Li, and M. Zhou. A recursive recurrent neural network for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1491-1500, 2014.
  8. D. Mandic and J. Chambers. Recurrent Neural Networks for Prediction: Learning Algorithms,Architectures and Stability. Wiley, 2001.
  9. A. Rather, A. Agarwal, and V. Sastry. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, pages 3234 - 3241, 2015.
  10. S. Saha and G. Raghava. Prediction of continuous b-cell epitopes in an antigen using recurrent neural network. Proteins, pages 40-48, 2006.
  11. S. Amari. Backpropagation and stochastic gradient descent method. Neurocomputing, pages 185 - 196, 1993.
  12. L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT), pages 177-186, 2010.
  13. L. Bottou. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition, pages 421-436. Springer Berlin Heidelberg, 2012.
  14. Y. Chauvin and D.E Rumelhart. Backpropagation: Theory, Architectures, and Applications. Developments in Connectionist Theory Series. Taylor & Francis, 2013.
  15. Y. Bengio, P. Frasconi, and P. Simard. The problem of learning long-term dependencies in recurrent networks. IEEE International Conference on Neural Networks, pages 1183-1188, 1993.
  16. R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. Proceedings of The 30th International Conference on Machine Learning, 2013.
  17. S. Strogatz, M. Friedman, A.J. Mallinckrodt, and S. McKay. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Computers in Physics, pages 532-532, 1994.
  18. F.H. Croom. Principles of Topology. Dover Books on Mathematics. Dover Publications, 2016.