The Effect of Hyperparameter Choice on ReLU and SELU Activation Function

  • Kevin, Pratama (Department of Ubiquitous IT, Graduate School, Dongseo University) ;
  • Kang, Dae-Ki (Department of Computer Engineering, Dongseo University)
  • Received : 2017.11.10
  • Accepted : 2017.12.07
  • Published : 2017.12.31


The Convolutional Neural Network (CNN) has shown an excellent performance in computer vision task. Applications of CNN include image classification, object detection in images, autonomous driving, etc. This paper will evaluate the performance of CNN model with ReLU and SELU as activation function. The evaluation will be performed on four different choices of hyperparameter which are initialization method, network configuration, optimization technique, and regularization. We did experiment on each choice of hyperparameter and show how it influences the network convergence and test accuracy. In this experiment, we also discover performance improvement when using SELU as activation function over ReLU.


Supported by : National Research Foundation of Korea (NRF)


  1. X. Glorot and Y. Bengio, "Understanding the Difficulty of Training Deep Feedforward Neural Networks", In Proc. AISTATS. Society for Artificial Intelligence and Statistics, 2010.
  2. K. He, X. Zhang, S. Ren and J, Sun, "Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification", In Proceedings of the IEEE International Conference on Computer Vision, 2015.
  3. G. Klambauer, T. Unterthiner, A. Mayr and S. Hochreiter, "Self-Normalizing Neural Networks", CoRR, 2017, abs/1706.02515.
  4. N. Qian, "On the Momentum Term in Gradient Descent Learning Algorithms", Neural Networks : The Official Journal of The International Neural Network Society, Vol. 12, No. 1, pp. 145-151, 1999.
  5. D. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", International Conference on Learning Representations, pp. 1-13, 2015.
  6. G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Improving Neural Networks by Preventing Coadaptation of Feature Detectors," 2012,