• 제목/요약/키워드: optimal learning

검색결과 1,187건 처리시간 0.031초

강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현 (Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning)

  • 박찬건;양성봉
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제30권7_8호
    • /
    • pp.672-680
    • /
    • 2003
  • shopbot이란 온라인상의 판매자로부터 상품에 대한 가격과 품질에 관한 정보를 자동적으로 수집함으로써 소비자의 만족을 최대화하는 소프트웨어 에이전트이다 이러한 shopbot에 대응해서 인터넷상의 판매자들은 그들에게 최대의 이익을 가져다 줄 수 있는 에이전트인 pricebot을 필요로 할 것이다. 본 논문에서는 pricebot의 가격결정 알고리즘으로 비 모델 강화 학습(model-free reinforcement learning) 방법중의 하나인 Q-학습(Q-learning)을 사용한다. Q-학습된 에이전트는 근시안적인 최적(myopically optimal 또는 myoptimal) 가격 결정 전략을 사용하는 에이전트에 비해 이익을 증가시키고 주기적 가격 전쟁(cyclic price war)을 감소시킬 수 있다. Q-학습 과정 중 Q-학습의 수렴을 위해 일련의 상태-행동(state-action)을 선택하는 것이 필요하다. 이러한 선택을 위해 균일 임의 선택방법 (Uniform Random Selection, URS)이 사용될 경우 최적 값의 수렴을 위해서 Q-테이블을 접근하는 회수가 크게 증가한다. 따라서 URS는 실 세계 환경에서의 범용적인 온라인 학습에는 부적절하다. 이와 같은 현상은 URS가 최적의 정책에 대한 이용(exploitation)의 불확실성을 반영하기 때문에 발생하게 된다. 이에 본 논문에서는 보조 마르코프 프로세스(auxiliary Markov process)와 원형 마르코프 프로세스(original Markov process)로 구성되는 혼합 비정적 정책 (Mixed Nonstationary Policy, MNP)을 제안한다. MNP가 적용된 Q-학습 에이전트는 original controlled process의 실행 시에 Q-학습에 의해 결정되는 stationary greedy 정책을 사용하여 학습함으로써 auxiliary Markov process와 original controlled process에 의해 평가 측정된 최적 정책에 대해 1의 확률로 exploitation이 이루어질 수 있도록 하여, URS에서 발생하는 최적 정책을 위한 exploitation의 불확실성의 문제를 해결하게 된다. 다양한 실험 결과 본 논문에서 제한한 방식이 URS 보다 평균적으로 약 2.6배 빠르게 최적 Q-값에 수렴하여 MNP가 적용된 Q-학습 에이전트가 범용적인 온라인 Q-학습이 가능함을 보였다.

Application Study of Reinforcement Learning Control for Building HVAC System

  • Cho, Sung-Hwan
    • International Journal of Air-Conditioning and Refrigeration
    • /
    • 제14권4호
    • /
    • pp.138-146
    • /
    • 2006
  • Recently, a technology based on the proportional integral (PI) control have grown rapidly owing to the needs for the robust capacity of the controllers from industrial building sectors. However, PI controller generally requires tuning of gains for optimal control when the outside weather condition changes. The present study presents the possibility of reinforcement learning (RL) control algorithm with PI controller adapted in the HVAC system. The optimal design criteria of RL controller was proposed in the environment chamber experiment and a theoretical analysis was also conducted using TRNSYS program.

동적 신경망에 기초한 불확실한 로봇 시스템의 적응 최적 학습제어기 (DNN-Based Adaptive Optimal Learning Controller for Uncertain Robot Systems)

  • 정재욱;국태용;이택종
    • 전자공학회논문지S
    • /
    • 제34S권6호
    • /
    • pp.1-10
    • /
    • 1997
  • This paper presents an adaptive optimal learning controller for uncertian robot systems which makes use fo simple DNN(dynamic neural network) units to estimate uncertain parameters and learn the unknown desired optimal input. With the aid of a lyapunov function, it is shown that all that error signals in the system are bounded and the robot trajectory converges to the desired one globally exponentially. The effectiveness of the proposed controller is hsown by applying the controller to a 2-DOF robot manipulator.

  • PDF

공조설비 최적 정지시각 결정에 관한 연구 (A Study on Determining the Optimal Stop Time of HVAC System)

  • 양인호
    • 설비공학논문집
    • /
    • 제13권1호
    • /
    • pp.30-37
    • /
    • 2001
  • The purpose of this study is to present the method to determine the optimal stop time of HVAC using Artificial Neural Network model, one of the learning methods. For this, the performance of determining the stop time of HVAC for unexperienced learning data was evaluated, and time interval for measurement of input data and permissible error needed for practical application of ANN model were presented using the results of daily simulation.

  • PDF

A Study on Determining the Optimal Stop Time of a Heating System

  • Yang, In-Ho
    • International Journal of Air-Conditioning and Refrigeration
    • /
    • 제13권1호
    • /
    • pp.22-30
    • /
    • 2005
  • The purpose of this study is to present a method to determine the optimal stop time of HVAC using the Artificial Neural Network model, which is one of the learning methods. For this, the performance of determining the stop time of HVAC for unexperienced learning data was evaluated, and time interval for measurement of input data and permissible error needed for practical application of ANN model were presented using the results from daily simulation.

Reinforcement Learning Algorithm Using Domain Knowledge

  • Young, Jang-Si;Hong, Suh-Il;Hak, Kong-Sung;Rok, Oh-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2001년도 ICCAS
    • /
    • pp.173.5-173
    • /
    • 2001
  • Q-Learning is a most widely used reinforcement learning, which addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goal about any one problem. Q-Learning can acquire optimal control strategies from delayed rewards, even when the agent has no prior knowledge of the effects of its action in the environment. If agent has an ability using previous knowledge, then it is expected that the agent can speed up learning by interacting with environment. We present a novel reinforcement learning method using domain knowledge, which is represented by problem-independent features and their classifiers. Here neural network are implied as knowledge classifiers. To show that an agent using domain knowledge can have better performance than the agent with standard Q-Learner. Computer simulations are ...

  • PDF

가변학습율과 온라인모드를 이용한 개선된 EBP 알고리즘 (Improved Error Backpropagation by Elastic Learning Rate and Online Update)

  • Lee, Tae-Seung;Park, Ho-Jin
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2004년도 봄 학술발표논문집 Vol.31 No.1 (B)
    • /
    • pp.568-570
    • /
    • 2004
  • The error-backpropagation (EBP) algerithm for training multilayer perceptrons (MLPs) is known to have good features of robustness and economical efficiency. However, the algorithm has difficulty in selecting an optimal constant learning rate and thus results in non-optimal learning speed and inflexible operation for working data. This paper Introduces an elastic learning rate that guarantees convergence of learning and its local realization by online upoate of MLP parameters Into the original EBP algorithm in order to complement the non-optimality. The results of experiments on a speaker verification system with Korean speech database are presented and discussed to demonstrate the performance improvement of the proposed method in terms of learning speed and flexibility fer working data of the original EBP algorithm.

  • PDF

Particle Swarm Optimization based on Vector Gaussian Learning

  • Zhao, Jia;Lv, Li;Wang, Hui;Sun, Hui;Wu, Runxiu;Nie, Jugen;Xie, Zhifeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권4호
    • /
    • pp.2038-2057
    • /
    • 2017
  • Gaussian learning is a new technology in the computational intelligence area. However, this technology weakens the learning ability of a particle swarm and achieves a lack of diversity. Thus, this paper proposes a vector Gaussian learning strategy and presents an effective approach, named particle swarm optimization based on vector Gaussian learning. The experiments show that the algorithm is more close to the optimal solution and the better search efficiency after we use vector Gaussian learning strategy. The strategy adopts vector Gaussian learning to generate the Gaussian solution of a swarm's optimal location, increases the learning ability of the swarm's optimal location, and maintains the diversity of the swarm. The method divides the states into normal and premature states by analyzing the state threshold of the swarm. If the swarm is in the premature category, the algorithm adopts an inertia weight strategy that decreases linearly in addition to vector Gaussian learning; otherwise, it uses a fixed inertia weight strategy. Experiments are conducted on eight well-known benchmark functions to verify the performance of the new approach. The results demonstrate promising performance of the new method in terms of convergence velocity and precision, with an improved ability to escape from a local optimum.

Influence on overfitting and reliability due to change in training data

  • Kim, Sung-Hyeock;Oh, Sang-Jin;Yoon, Geun-Young;Jung, Yong-Gyu;Kang, Min-Soo
    • International Journal of Advanced Culture Technology
    • /
    • 제5권2호
    • /
    • pp.82-89
    • /
    • 2017
  • The range of problems that can be handled by the activation of big data and the development of hardware has been rapidly expanded and machine learning such as deep learning has become a very versatile technology. In this paper, mnist data set is used as experimental data, and the Cross Entropy function is used as a loss model for evaluating the efficiency of machine learning, and the value of the loss function in the steepest descent method is We applied the GradientDescentOptimize algorithm to minimize and updated weight and bias via backpropagation. In this way we analyze optimal reliability value corresponding to the number of exercises and optimal reliability value without overfitting. And comparing the overfitting time according to the number of data changes based on the number of training times, when the training frequency was 1110 times, we obtained the result of 92%, which is the optimal reliability value without overfitting.

NETLA를 이용한 이진 신경회로망의 최적 합성방법 (Optimal Synthesis Method for Binary Neural Network using NETLA)

  • 성상규;김태우;박두환;조현우;하홍곤;이준탁
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2001년도 하계학술대회 논문집 D
    • /
    • pp.2726-2728
    • /
    • 2001
  • This paper describes an optimal synthesis method of binary neural network(BNN) for an approximation problem of a circular region using a newly proposed learning algorithm[7] Our object is to minimize the number of connections and neurons in hidden layer by using a Newly Expanded and Truncated Learning Algorithm(NETLA) for the multilayer BNN. The synthesis method in the NETLA is based on the extension principle of Expanded and Truncated Learning(ETL) and is based on Expanded Sum of Product (ESP) as one of the boolean expression techniques. And it has an ability to optimize the given BNN in the binary space without any iterative training as the conventional Error Back Propagation(EBP) algorithm[6] If all the true and false patterns are only given, the connection weights and the threshold values can be immediately determined by an optimal synthesis method of the NETLA without any tedious learning. Futhermore, the number of the required neurons in hidden layer can be reduced and the fast learning of BNN can be realized. The superiority of this NETLA to other algorithms was proved by the approximation problem of one circular region.

  • PDF