DOI QR코드

DOI QR Code

Performance Improvement Method of Fully Connected Neural Network Using Combined Parametric Activation Functions

결합된 파라메트릭 활성함수를 이용한 완전연결신경망의 성능 향상

  • Received : 2021.05.31
  • Accepted : 2021.07.28
  • Published : 2022.01.31

Abstract

Deep neural networks are widely used to solve various problems. In a fully connected neural network, the nonlinear activation function is a function that nonlinearly transforms the input value and outputs it. The nonlinear activation function plays an important role in solving the nonlinear problem, and various nonlinear activation functions have been studied. In this study, we propose a combined parametric activation function that can improve the performance of a fully connected neural network. Combined parametric activation functions can be created by simply adding parametric activation functions. The parametric activation function is a function that can be optimized in the direction of minimizing the loss function by applying a parameter that converts the scale and location of the activation function according to the input data. By combining the parametric activation functions, more diverse nonlinear intervals can be created, and the parameters of the parametric activation functions can be optimized in the direction of minimizing the loss function. The performance of the combined parametric activation function was tested through the MNIST classification problem and the Fashion MNIST classification problem, and as a result, it was confirmed that it has better performance than the existing nonlinear activation function and parametric activation function.

완전연결신경망은 다양한 문제를 해결하는데 널리 사용되고 있다. 완전연결신경망에서 비선형활성함수는 선형변환 값을 비선형 변환하여 출력하는 함수로써 비선형 문제를 해결하는데 중요한 역할을 하며 다양한 비선형활성함수들이 연구되었다. 본 연구에서는 완전연결신경망의 성능을 향상시킬 수 있는 결합된 파라메트릭 활성함수를 제안한다. 결합된 파라메트릭 활성함수는 간단히 파라메트릭 활성함수들을 더함으로써 만들어낼 수 있다. 파라메트릭 활성함수는 입력데이터에 따라 활성함수의 크기와 위치를 변환시키는 파라미터를 도입하여 손실함수를 최소화하는 방향으로 최적화할 수 있는 함수이다. 파라메트릭 활성함수들을 결합함으로써 더욱 다양한 비선형간격을 만들어낼 수 있으며 손실함수를 최소화하는 방향으로 파라메트릭 활성함수들의 파라미터를 최적화할 수 있다. MNIST 분류문제와 Fashion MNIST 분류문제를 통하여 결합된 파라메트릭 활성함수의 성능을 실험하였고 그 결과 기존에 사용되는 비선형활성함수, 파라메트릭 활성함수보다 우수한 성능을 가짐을 확인하였다.

Keywords

References

  1. K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, Vol.2, Iss.5, pp.359-366, 1989. https://doi.org/10.1016/0893-6080(89)90020-8
  2. Y. Bengio, I. Goodfellow, and A. Courville, "Deep learning," MIT Press, 2017.
  3. N. Y. Kong and S. W. Ko, "Performance improvement method of deep neural network using parametric activation functions," Journal of the Korea Contents Association, Vol.21, No.3, pp.616-625, 2021. https://doi.org/10.5392/JKCA.2021.21.03.616
  4. N. Y. Kong, Y. M. Ko, and S. W. Ko, "Performance improvement method of convolutional neural network using agile activation function," KIPS Transactions on Software and Data Engineering, Vol.9, No.7, pp.213-220, 2020. https://doi.org/10.3745/KTSDE.2020.9.7.213
  5. Y. M. Ko and S. W. Ko, "Alleviation of vanishing gradient problem using parametric activation functions," KIPS Transactions on Software and Data Engineering, Vol.10, No.10, pp.407-420, 2021. https://doi.org/10.3745/KTSDE.2021.10.10.407
  6. V. Nair and G. Hinton, "Rectified linear units improve restricted boltzmann machines," ICML, pp.807-814, 2010.
  7. M. Roodschild, J. Gotay Sardinas, and A. Will, "A new approach for the vanishing gradient problem on sigmoid activation," Springer Nature, Vol.20, Iss.4, pp.351-360, 2020.
  8. Y. Qin, X. Wang, and J. Zou, "The optimized deep belief networkswith improved logistic Sigmoid units and their application in faultdiagnosis for planetary gearboxes of wind turbines," IEEE Transactions on Industrial Electronics, Vol.66, No.5, pp.3814-3824, 2018. https://doi.org/10.1109/tie.2018.2856205
  9. X. Wang, Y. Qin, Y. Wang, S. Xiang, and H. Chen, "ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis," Neurocomputing, Vol.363, pp.88-98, 2019. https://doi.org/10.1016/j.neucom.2019.07.017
  10. S. Kong and M. Takatsuka, "Hexpo: A vanishing-proof activation function," International Joint Conference on Neural Networks, pp.2562-2567, 2017.
  11. B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolution network," arXiv:1505. 00853, 2015.
  12. R. Pascanu, T. Mikolov, and Y. Bengio, "Understanding the exploding gradient problem," arXiv:1211.5063, 2012.
  13. R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," arXiv:1211.5063, 2013.
  14. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," International Conference on Computer Vision, arXiv:1502.01852, 2015.
  15. D. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units(ELUs)," arXiv:1511.07289, 2016.