• Title/Summary/Keyword: Stochastic Gradient descent

Search Result 42, Processing Time 0.023 seconds

Comparison of Gradient Descent for Deep Learning (딥러닝을 위한 경사하강법 비교)

  • Kang, Min-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.2
    • /
    • pp.189-194
    • /
    • 2020
  • This paper analyzes the gradient descent method, which is the one most used for learning neural networks. Learning means updating a parameter so the loss function is at its minimum. The loss function quantifies the difference between actual and predicted values. The gradient descent method uses the slope of the loss function to update the parameter to minimize error, and is currently used in libraries that provide the best deep learning algorithms. However, these algorithms are provided in the form of a black box, making it difficult to identify the advantages and disadvantages of various gradient descent methods. This paper analyzes the characteristics of the stochastic gradient descent method, the momentum method, the AdaGrad method, and the Adadelta method, which are currently used gradient descent methods. The experimental data used a modified National Institute of Standards and Technology (MNIST) data set that is widely used to verify neural networks. The hidden layer consists of two layers: the first with 500 neurons, and the second with 300. The activation function of the output layer is the softmax function, and the rectified linear unit function is used for the remaining input and hidden layers. The loss function uses cross-entropy error.

Nonlinear optimization algorithm using monotonically increasing quantization resolution

  • Jinwuk Seok;Jeong-Si Kim
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.119-130
    • /
    • 2023
  • We propose a quantized gradient search algorithm that can achieve global optimization by monotonically reducing the quantization step with respect to time when quantization is composed of integer or fixed-point fractional values applied to an optimization algorithm. According to the white noise hypothesis states, a quantization step is sufficiently small and the quantization is well defined, the round-off error caused by quantization can be regarded as a random variable with identically independent distribution. Thus, we rewrite the searching equation based on a gradient descent as a stochastic differential equation and obtain the monotonically decreasing rate of the quantization step, enabling the global optimization by stochastic analysis for deriving an objective function. Consequently, when the search equation is quantized by a monotonically decreasing quantization step, which suitably reduces the round-off error, we can derive the searching algorithm evolving from an optimization algorithm. Numerical simulations indicate that due to the property of quantization-based global optimization, the proposed algorithm shows better optimization performance on a search space to each iteration than the conventional algorithm with a higher success rate and fewer iterations.

A survey on parallel training algorithms for deep neural networks (심층 신경망 병렬 학습 방법 연구 동향)

  • Yook, Dongsuk;Lee, Hyowon;Yoo, In-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.6
    • /
    • pp.505-514
    • /
    • 2020
  • Since a large amount of training data is typically needed to train Deep Neural Networks (DNNs), a parallel training approach is required to train the DNNs. The Stochastic Gradient Descent (SGD) algorithm is one of the most widely used methods to train the DNNs. However, since the SGD is an inherently sequential process, it requires some sort of approximation schemes to parallelize the SGD algorithm. In this paper, we review various efforts on parallelizing the SGD algorithm, and analyze the computational overhead, communication overhead, and the effects of the approximations.

Adaptive stochastic gradient method under two mixing heterogenous models (두 이종 혼합 모형에서의 수정된 경사 하강법)

  • Moon, Sang Jun;Jeon, Jong-June
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1245-1255
    • /
    • 2017
  • The online learning is a process of obtaining the solution for a given objective function where the data is accumulated in real time or in batch units. The stochastic gradient descent method is one of the most widely used for the online learning. This method is not only easy to implement, but also has good properties of the solution under the assumption that the generating model of data is homogeneous. However, the stochastic gradient method could severely mislead the online-learning when the homogeneity is actually violated. We assume that there are two heterogeneous generating models in the observation, and propose the a new stochastic gradient method that mitigate the problem of the heterogeneous models. We introduce a robust mini-batch optimization method using statistical tests and investigate the convergence radius of the solution in the proposed method. Moreover, the theoretical results are confirmed by the numerical simulations.

Pan evaporation modeling using deep learning theory (Deep learning 이론을 이용한 증발접시 증발량 모형화)

  • Seo, Youngmin;Kim, Sungwon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.392-395
    • /
    • 2017
  • 본 연구에서는 일 증발접시 증발량 산정을 위한 딥러닝 (deep learning) 모형의 적용성을 평가하였다. 본 연구에서 적용된 딥러닝 모형은 deep belief network (DBN) 기반 deep neural network (DNN) (DBN-DNN) 모형이다. 모형 적용성 평가를 위하여 부산 관측소에서 측정된 기상자료를 활용하였으며, 증발량과의 상관성이 높은 기상변수들 (일사량, 일조시간, 평균지상온도, 최대기온)의 조합을 고려하여 입력변수집합 (Set 1, Set 2, Set 3)별 모형을 구축하였다. DBN-DNN 모형의 성능은 통계학적 모형성능 평가지표 (coefficient of efficiency, CE; coefficient of determination, $r^2$; root mean square error, RMSE; mean absolute error, MAE)를 이용하여 평가되었으며, 기존의 두가지 형태의 ANN (artificial neural network), 즉 모형학습 시 SGD (stochastic gradient descent) 및 GD (gradient descent)를 각각 적용한 ANN-SGD 및 ANN-GD 모형과 비교하였다. 효과적인 모형학습을 위하여 각 모형의 초매개변수들은 GA (genetic algorithm)를 이용하여 최적화하였다. 그 결과, Set 1에 대하여 ANN-GD1 모형, Set 2에 대하여 DBN-DNN2 모형, Set 3에 대하여 DBN-DNN3 모형이 가장 우수한 모형 성능을 나타내는 것으로 분석되었다. 비록 비교 모형들 사이의 모형성능이 큰 차이를 보이지는 않았으나, 모든 입력집합에 대하여 DBN-DNN3, DBN-DNN2, ANN-SGD3 순으로 모형 효율성이 우수한 것으로 나타났다.

  • PDF

Novel steepest descent adaptive filters derived from new performance function (새로운 성능지수 함수에 대한 직강하 적응필터)

  • 전병을;박동조
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1992.10a
    • /
    • pp.823-828
    • /
    • 1992
  • A novel steepest descent adaptive filter algorithm, which uses the instantaneous stochastic gradient for the steepest descent direction, is derived from a newly devised performance index function. The performance function for the new algorithm is improved from that for the LMS in consideration that the stochastic steepest descent method is utilized to minimize the performance index iterativly. Through mathematical analysis and computer simulations, it is verified that there are substantial improvements in convergence and misadjustments even though the computational simplicity and the robustness of the LMS algorithm are hardly sacrificed. On the other hand, the new algorithm can be interpreted as a variable step size adaptive filter, and in this respect a heuristic method is proposed in order to reduce the noise caused by the step size fluctuation.

  • PDF

Drought index forecast using ensemble learning (앙상블 기법을 이용한 가뭄지수 예측)

  • Jeong, Jihyeon;Cha, Sanghun;Kim, Myojeong;Kim, Gwangseob;Lim, Yoon-Jin;Lee, Kyeong Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1125-1132
    • /
    • 2017
  • In a situation where the severity and frequency of drought events getting stronger and higher, many studies related to drought forecast have been conducted to improve the drought forecast accuracy. However it is difficult to predict drought events using a single model because of nonlinear and complicated characteristics of temporal behavior of drought events. In this study, in order to overcome the shortcomings of the single model approach, we first build various single models capable to explain the relationship between the meteorological drought index, Standardized Precipitation Index (SPI), and other independent variables such as world climate indices. Then, we developed a combined models using Stochastic Gradient Descent method among Ensemble Learnings.

An Efficient Traning of Multilayer Neural Newtorks Using Stochastic Approximation and Conjugate Gradient Method (확률적 근사법과 공액기울기법을 이용한 다층신경망의 효율적인 학습)

  • 조용현
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.5
    • /
    • pp.98-106
    • /
    • 1998
  • This paper proposes an efficient learning algorithm for improving the training performance of the neural network. The proposed method improves the training performance by applying the backpropagation algorithm of a global optimization method which is a hybrid of a stochastic approximation and a conjugate gradient method. The approximate initial point for f a ~gtl obal optimization is estimated first by applying the stochastic approximation, and then the conjugate gradient method, which is the fast gradient descent method, is applied for a high speed optimization. The proposed method has been applied to the parity checking and the pattern classification, and the simulation results show that the performance of the proposed method is superior to those of the conventional backpropagation and the backpropagation algorithm which is a hyhrid of the stochastic approximation and steepest descent method.

  • PDF

A STOCHASTIC VARIANCE REDUCTION METHOD FOR PCA BY AN EXACT PENALTY APPROACH

  • Jung, Yoon Mo;Lee, Jae Hwa;Yun, Sangwoon
    • Bulletin of the Korean Mathematical Society
    • /
    • v.55 no.4
    • /
    • pp.1303-1315
    • /
    • 2018
  • For principal component analysis (PCA) to efficiently analyze large scale matrices, it is crucial to find a few singular vectors in cheaper computational cost and under lower memory requirement. To compute those in a fast and robust way, we propose a new stochastic method. Especially, we adopt the stochastic variance reduced gradient (SVRG) method [11] to avoid asymptotically slow convergence in stochastic gradient descent methods. For that purpose, we reformulate the PCA problem as a unconstrained optimization problem using a quadratic penalty. In general, increasing the penalty parameter to infinity is needed for the equivalence of the two problems. However, in this case, exact penalization is guaranteed by applying the analysis in [24]. We establish the convergence rate of the proposed method to a stationary point and numerical experiments illustrate the validity and efficiency of the proposed method.

Stochastic Gradient Descent Optimization Model for Demand Response in a Connected Microgrid

  • Sivanantham, Geetha;Gopalakrishnan, Srivatsun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.97-115
    • /
    • 2022
  • Smart power grid is a user friendly system that transforms the traditional electric grid to the one that operates in a co-operative and reliable manner. Demand Response (DR) is one of the important components of the smart grid. The DR programs enable the end user participation by which they can communicate with the electricity service provider and shape their daily energy consumption patterns and reduce their consumption costs. The increasing demands of electricity owing to growing population stresses the need for optimal usage of electricity and also to look out alternative and cheap renewable sources of electricity. The solar and wind energy are the promising sources of alternative energy at present because of renewable nature and low cost implementation. The proposed work models a smart home with renewable energy units. The random nature of the renewable sources like wind and solar energy brings an uncertainty to the model developed. A stochastic dual descent optimization method is used to bring optimality to the developed model. The proposed work is validated using the simulation results. From the results it is concluded that proposed work brings a balanced usage of the grid power and the renewable energy units. The work also optimizes the daily consumption pattern thereby reducing the consumption cost for the end users of electricity.