Search | Korea Science

Alleviation of Vanishing Gradient Problem Using Parametric Activation Functions (파라메트릭 활성함수를 이용한 기울기 소실 문제의 완화)

Ko, Young Min;Ko, Sun Woo
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.10
- /
- pp.407-420
- /
- 2021
Deep neural networks are widely used to solve various problems. However, the deep neural network with a deep hidden layer frequently has a vanishing gradient or exploding gradient problem, which is a major obstacle to learning the deep neural network. In this paper, we propose a parametric activation function to alleviate the vanishing gradient problem that can be caused by nonlinear activation function. The proposed parametric activation function can be obtained by applying a parameter that can convert the scale and location of the activation function according to the characteristics of the input data, and the loss function can be minimized without limiting the derivative of the activation function through the backpropagation process. Through the XOR problem with 10 hidden layers and the MNIST classification problem with 8 hidden layers, the performance of the original nonlinear and parametric activation functions was compared, and it was confirmed that the proposed parametric activation function has superior performance in alleviating the vanishing gradient.
https://doi.org/10.3745/KTSDE.2021.10.10.407 인용 PDF KSCI

A piecewise affine approximation of sigmoid activation functions in multi-layered perceptrons and a comparison with a quantization scheme (다중계층 퍼셉트론 내 Sigmoid 활성함수의 구간 선형 근사와 양자화 근사와의 비교)

윤병문;신요안
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.2
- /
- pp.56-64
- /
- 1998
Multi-layered perceptrons that are a nonlinear neural network model, have been widely used for various applications mainly thanks to good function approximation capability for nonlinear fuctions. However, for digital hardware implementation of the multi-layere perceptrons, the quantization scheme using "look-up tables (LUTs)" is commonly employed to handle nonlinear signmoid activation functions in the neworks, and thus requires large amount of storage to prevent unacceptable quantization errors. This paper is concerned with a new effective methodology for digital hardware implementation of multi-layered perceptrons, and proposes a "piecewise affine approximation" method in which input domain is divided into (small number of) sub-intervals and nonlinear sigmoid function is linearly approximated within each sub-interval. Using the proposed method, we develop an expression and an error backpropagation type learning algorithm for a multi-layered perceptron, and compare the performance with the quantization method through Monte Carlo simulations on XOR problems. Simulation results show that, in terms of learning convergece, the proposed method with a small number of sub-intervals significantly outperforms the quantization method with a very large storage requirement. We expect from these results that the proposed method can be utilized in digital system implementation to significantly reduce the storage requirement, quantization error, and learning time of the quantization method.quantization method.
PDF

Multilayer Neural Network Using Delta Rule: Recognitron III (텔타규칙을 이용한 다단계 신경회로망 컴퓨터:Recognitron III)

김춘석;박충규;이기한;황희영
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.40 no.2
- /
- pp.224-233
- /
- 1991
The multilayer expanson of single layer NN (Neural Network) was needed to solve the linear seperability problem as shown by the classic example using the XOR function. The EBP (Error Back Propagation ) learning rule is often used in multilayer Neural Networks, but it is not without its faults: 1)D.Rimmelhart expanded the Delta Rule but there is a problem in obtaining Ca from the linear combination of the Weight matrix N between the hidden layer and the output layer and H, wich is the result of another linear combination between the input pattern and the Weight matrix M between the input layer and the hidden layer. 2) Even if using the difference between Ca and Da to adjust the values of the Weight matrix N between the hidden layer and the output layer may be valid is correct, but using the same value to adjust the Weight matrixd M between the input layer and the hidden layer is wrong. Recognitron III was proposed to solve these faults. According to simulation results, since Recognitron III does not learn the three layer NN itself, but divides it into several single layer NNs and learns these with learning patterns, the learning time is 32.5 to 72.2 time faster than EBP NN one. The number of patterns learned in a EBP NN with n input and output cells and n+1 hidden cells are 2**n, but n in Recognitron III of the same size. [5] In the case of pattern generalization, however, EBP NN is less than Recognitron III.
PDF

Performance Improvement Method of Deep Neural Network Using Parametric Activation Functions (파라메트릭 활성함수를 이용한 심층신경망의 성능향상 방법)

Kong, Nayoung;Ko, Sunwoo
- The Journal of the Korea Contents Association
- /
- v.21 no.3
- /
- pp.616-625
- /
- 2021
Deep neural networks are an approximation method that approximates an arbitrary function to a linear model and then repeats additional approximation using a nonlinear active function. In this process, the method of evaluating the performance of approximation uses the loss function. Existing in-depth learning methods implement approximation that takes into account loss functions in the linear approximation process, but non-linear approximation phases that use active functions use non-linear transformation that is not related to reduction of loss functions of loss. This study proposes parametric activation functions that introduce scale parameters that can change the scale of activation functions and location parameters that can change the location of activation functions. By introducing parametric activation functions based on scale and location parameters, the performance of nonlinear approximation using activation functions can be improved. The scale and location parameters in each hidden layer can improve the performance of the deep neural network by determining parameters that minimize the loss function value through the learning process using the primary differential coefficient of the loss function for the parameters in the backpropagation. Through MNIST classification problems and XOR problems, parametric activation functions have been found to have superior performance over existing activation functions.
https://doi.org/10.5392/JKCA.2021.21.03.616 인용 PDF KSCI HTML

Fast Learning Algorithms for Neural Network Using Tabu Search Method with Random Moves (Random Tabu 탐색법을 이용한 신경회로망의 고속학습알고리즘에 관한 연구)

양보석;신광재;최원호
- Journal of the Korean Institute of Intelligent Systems
- /
- v.5 no.3
- /
- pp.83-91
- /
- 1995
A neural network with one or more layers of hidden units can be trained using the well-known error back propagation algorithm. According to this algorithm, the synaptic weights of the network are updated during the training by propagating back the error between the expected output and the output provided by the network. However, the error back propagation algorithm is characterized by slow convergence and the time required for training and, in some situation, can be trapped in local minima. A theoretical formulation of a new fast learning method based on tabu search method is presented in this paper. In contrast to the conventional back propagation algorithm which is based solely on the modification of connecting weights of the network by trial and error, the present method involves the calculation of the optimum weights of neural network. The effectiveness and versatility of the present method are verified by the XOR problem. The present method excels in accuracy compared to that of the conventional method of fixed values.
PDF

Efficient weight initialization method in multi-layer perceptrons

Han, Jaemin;Sung, Shijoong;Hyun, Changho
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 1995.09a
- /
- pp.325-333
- /
- 1995
Back-propagation is the most widely used algorithm for supervised learning in multi-layer feed-forward networks. However, back-propagation is very slow in convergence. In this paper, a new weight initialization method, called rough map initialization, in multi-layer perceptrons is proposed. To overcome the long convergence time, possibly due to the random initialization of the weights of the existing multi-layer perceptrons, the rough map initialization method initialize weights by utilizing relationship of input-output features with singular value decomposition technique. The results of this initialization procedure are compared to random initialization procedure in encoder problems and xor problems.
PDF

Search Result 26, Processing Time 0.015 seconds

Alleviation of Vanishing Gradient Problem Using Parametric Activation Functions (파라메트릭 활성함수를 이용한 기울기 소실 문제의 완화)

A piecewise affine approximation of sigmoid activation functions in multi-layered perceptrons and a comparison with a quantization scheme (다중계층 퍼셉트론 내 Sigmoid 활성함수의 구간 선형 근사와 양자화 근사와의 비교)

Multilayer Neural Network Using Delta Rule: Recognitron III (텔타규칙을 이용한 다단계 신경회로망 컴퓨터:Recognitron III)

Performance Improvement Method of Deep Neural Network Using Parametric Activation Functions (파라메트릭 활성함수를 이용한 심층신경망의 성능향상 방법)

Fast Learning Algorithms for Neural Network Using Tabu Search Method with Random Moves (Random Tabu 탐색법을 이용한 신경회로망의 고속학습알고리즘에 관한 연구)

Efficient weight initialization method in multi-layer perceptrons

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)