Improvement of the Convergence Rate of Deep Learning by Using Scaling Method

  • Ho, Jiacang (Department of Ubiquitous IT, Graduate School, Dongseo University) ;
  • Kang, Dae-Ki (Department of Computer Engineering, Dongseo University)
  • 투고 : 2017.11.10
  • 심사 : 2017.12.05
  • 발행 : 2017.12.31


Deep learning neural network becomes very popular nowadays due to the reason that it can learn a very complex dataset such as the image dataset. Although deep learning neural network can produce high accuracy on the image dataset, it needs a lot of time to reach the convergence stage. To solve the issue, we have proposed a scaling method to improve the neural network to achieve the convergence stage in a shorter time than the original method. From the result, we can observe that our algorithm has higher performance than the other previous work.



  1. Y. LeCun, Y. Bengio, and G. Hinton. "Deep learning," Nature Vol. 521, pp. 436-444, 2015.
  2. T. Mikolov et al., "Recurrent neural network based language model," Interspeech, Vol. 2, 2010.
  3. G. Hinton, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, Vol. 29, No. 6, pp. 82-97, 2012.
  4. G. E. Hinton, "Deep belief networks," Scholarpedia, Vol. 4, No. 5, 2009.
  5. S. Becker and Y. Le Cun, "Improving the convergence of back-propagation learning with second order methods," in Proc. of the 1988 connectionist models summer school. 1988.
  6. Y. Nesterov, "A method of solving a convex programming problem with convergence rate O (1/k2)," Soviet Mathematics Doklady. Vol. 27. No. 2. 1983.
  7. J. Duchi, E. Hazan and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, pp. 2121-2159, 12 Jul. 2011.
  8. T. Tieleman and G. Hinton, "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude," COURSERA: Neural networks for machine learning 4.2, pp. 26-31, 2012.
  9. D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv, 1412.6980, 2014.
  10. D. W. Hosmer Jr, S. Lemeshow and R. X. Sturdivant, Applied logistic regression. Vol. 398. John Wiley & Sons, 2013.
  11. D. W. Ruck, et al., "The multilayer perceptron as an approximation to a Bayes optimal discriminant function," IEEE Transactions on Neural Networks, Vol. 1, No. 4, pp. 296-298, 1990.
  12. M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, 1997.
  13. A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, 2012.
  14. N. Srivastava, et al., "Dropout: a simple way to prevent neural networks from overfitting," Journal of machine learning research, Vol. 15, No. 1, pp. 1929-1958, 2014.