Initialization by using truncated distributions in artificial neural network

Kim, MinJong;Cho, Sungchul;Jeong, Hyerin;Lee, YungSeop;Lim, Changwon;

doi:10.5351/KJAS.2019.32.5.693

The Korean Journal of Applied Statistics (응용통계연구)

Volume 32 Issue 5
/
Pages.693-702
/
2019
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Initialization by using truncated distributions in artificial neural network

절단된 분포를 이용한 인공신경망에서의 초기값 설정방법

Kim, MinJong (Department of Applied Statistics, Chung-Ang University) ;
Cho, Sungchul (Department of Applied Statistics, Chung-Ang University) ;
Jeong, Hyerin (Department of Applied Statistics, Chung-Ang University) ;
Lee, YungSeop (Department of Statistics, Dongguk University) ;
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)

김민종 (중앙대학교 응용통계학과) ;
조성철 (중앙대학교 응용통계학과) ;
정혜린 (중앙대학교 응용통계학과) ;
이영섭 (동국대학교 통계학과) ;
임창원 (중앙대학교 응용통계학과)

Received : 2019.06.24
Accepted : 2019.08.20
Published : 2019.10.31

https://doi.org/10.5351/KJAS.2019.32.5.693 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Deep learning has gained popularity for the classification and prediction task. Neural network layers become deeper as more data becomes available. Saturation is the phenomenon that the gradient of an activation function gets closer to 0 and can happen when the value of weight is too big. Increased importance has been placed on the issue of saturation which limits the ability of weight to learn. To resolve this problem, Glorot and Bengio (Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249-256, 2010) claimed that efficient neural network training is possible when data flows variously between layers. They argued that variance over the output of each layer and variance over input of each layer are equal. They proposed a method of initialization that the variance of the output of each layer and the variance of the input should be the same. In this paper, we propose a new method of establishing initialization by adopting truncated normal distribution and truncated cauchy distribution. We decide where to truncate the distribution while adapting the initialization method by Glorot and Bengio (2010). Variances are made over output and input equal that are then accomplished by setting variances equal to the variance of truncated distribution. It manipulates the distribution so that the initial values of weights would not grow so large and with values that simultaneously get close to zero. To compare the performance of our proposed method with existing methods, we conducted experiments on MNIST and CIFAR-10 data using DNN and CNN. Our proposed method outperformed existing methods in terms of accuracy.

딥러닝은 대용량의 데이터의 분류 및 예측하는 방법으로 각광받고 있다. 데이터의 양이 많아지면서 신경망의 구조는 더 깊어 지고 있다. 이때 초기값이 지나치게 클 경우 층이 깊어 질수록 활성화 함수의 기울기가 매우 작아지는 포화(Saturation)현상이 발생한다. 이러한 포화현상은 가중치의 학습능력을 저하시키는 현상을 발생시키기 때문에 초기값의 중요성이 커지고 있다.이런 포화현상 문제를 해결하기 위해 Glorot과 Bengio (2010)과 He 등 (2015) 층과 층 사이에 데이터가 다양하게 흘러야 효율적인 신경망학습이 가능하고 주장했다. 데이터가 다양하게 흐르기 위해서는 각 층의 출력에 대한 분산과 입력에 대한 분산이 동일해야 한다고 제안했다. Glorot과 Bengio (2010)과 He 등 (2015)는 각 층별 활성화 값의 분산이 같다고 가정해 초기값을 설정하였다. 본 논문에서는 절단된 코쉬 분포와 절단된 정규분포를 활용하여 초기값을 설정하는 방안을 제안한다. 출력에 대한 분산과 입력에 대한 분산의 값을 동일하게 맞춰주고 그 값이 절단된 확률분포의 분산과 같게 적용함으로써 큰 초기값이 나오는 걸 제한하고 0에 가까운 값이 나오도록 분포를 조정하였다. 제안된 방법은 MNIST 데이터와 CIFAR-10 데이터를 DNN과 CNN 모델에 각각 적용하여 실험함으로써 기존의 초기값 설정방법보다 모델의 성능을 좋게 한다는 것을 보였다.

Keywords

References

Clevert, D. A., Unterthiner, T., and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249-256).
Goodfellow, I. J., Vinyals, O., and Saxe, A. M. (2014). Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544.
Hanin, B. and Rolnick, D. (2018). How to start training: The effect of initialization and architecture. In Advances in Neural Information Processing Systems (pp. 571-581).
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1026-1034).
Humbird, K. D., Peterson, J. L., and McClarren, R. G. (2018). Deep neural network initialization with decision trees, EEE Transactions on Neural Networks and Learning Systems, 30, 1286-1295.
Hayou, S., Doucet, A., and Rousseau, J. (2018). On the selection of initialization and activation function for deep neural networks. arXiv preprint arXiv:1805.08266.
Krahenbuhl, P., Doersch, C., Donahue, J., and Darrell, T. (2015). Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856.
Krizhevsky, A. and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images (Vol. 1, No. 4, p. 7) Technical report, University of Toronto.
LeCun, Y., Bottou, L., Orr, G., and Muller, K. (1998a). Efficient backprop in neural networks: Tricks of the trade (Orr, G. and Muller, K., eds.), Lecture Notes in Computer Science, 1524(98), 111.
LeCun, Y., Cortes, C., and Burges, C. J. (1998b). The MNIST Database of Handwritten Digits.
Mishkin, D. and Matas, J. (2015). All you need is a good init. arXiv preprint arXiv:1511.06422.
Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013). On the importance of initialization and momentum in deep learning, In International Conference on Machine Learning (pp. 1139-1147).

The Korean Journal of Applied Statistics (응용통계연구)

Initialization by using truncated distributions in artificial neural network

절단된 분포를 이용한 인공신경망에서의 초기값 설정방법

Abstract

Keywords

References

Detail Search