DOI QR코드

DOI QR Code

Comparison of Hyper-Parameter Optimization Methods for Deep Neural Networks

  • Kim, Ho-Chan (Dept. of Electrical Eng., Jeju National Univ.) ;
  • Kang, Min-Jae (Dept. of Electrical Eng., Jeju National Univ.)
  • Received : 2020.11.24
  • Accepted : 2020.12.18
  • Published : 2020.12.31

Abstract

Research into hyper parameter optimization (HPO) has recently revived with interest in models containing many hyper parameters, such as deep neural networks. In this paper, we introduce the most widely used HPO methods, such as grid search, random search, and Bayesian optimization, and investigate their characteristics through experiments. The MNIST data set is used to compare results in experiments to find the best method that can be used to achieve higher accuracy in a relatively short time simulation. The learning rate and weight decay have been chosen for this experiment because these are the commonly used parameters in this kind of experiment.

Keywords

Acknowledgement

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF-2018R1D1A1B07045976) in (2018).

References

  1. M. Feurer and F. Hutter, "Hyperparameter optimization," in F. Hutter, L. Kotthoff, and J. Vanschoren (Eds.), Automated Machine Learning, pp.3-33, Springer, 2019.
  2. J. Bergstra, D. Yamins, and D. D. Cox, "Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures," in Proc. of the 30th International Conference on Machine Learning, vol.28, pp.115-123, 2013. DOI: 10.5555/3042817.3042832
  3. J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of machine learning research, vol.13, pp.281-305, 2012. DOI: 10.5555/2188385.2188395
  4. J. Snoek, H. Larochelle, and R. P. Adams, "Practical bayesian optimization of machine learning algorithms," Advances in Neural Information Processing Systems, vol.25, pp.2951-2959, 2012.
  5. Brochu, E., Cora, M., and de Freitas, N. "A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning," In TR-2009-23, UBC, 2009.
  6. J. Wang, J. Xu, and X. Wang, "Combination of hyperband and bayesian optimization for hyperparameter optimization in deep learning," arXiv preprint arXiv:1801.01596, 2018. https://arxiv.org/abs/1801.01596
  7. S. Falkner, A. Klein, and F. Hutter, "Bohb: Robust and efficient hyperparameter optimization at scale," Proceedings of Machine Learning Research, vol.80, pp.1437-1446, 2018.
  8. C. Harrington, "Practical guide to hyperparameters optimization for deep learning models," Deep Learning, 2018. https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/
  9. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient based learning applied to document recognition," Proceedings of the IEEE, vol.86, no.11, pp.2278-2324, 1998.