• Title/Summary/Keyword: TD(${\lambda}$)-learning

Search Result 5, Processing Time 0.02 seconds

Goal-Directed Reinforcement Learning System (목표지향적 강화학습 시스템)

  • Lee, Chang-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.265-270
    • /
    • 2010
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like TD-learning and TD(${\lambda}$)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present GDRLS algorithm for finding the shortest path faster in a maze environment. GDRLS is select the candidate states that can guide the shortest path in maze environment, and learn only the candidate states to find the shortest path. Through experiments, we can see that GDRLS can search the shortest path faster than TD-learning and TD(${\lambda}$)-learning in maze environment.

Applying Neuro-fuzzy Reasoning to Go Opening Games (뉴로-퍼지 추론을 적용한 포석 바둑)

  • Lee, Byung-Doo
    • Journal of Korea Game Society
    • /
    • v.9 no.6
    • /
    • pp.117-125
    • /
    • 2009
  • This paper describes the result of applying neuro-fuzzy reasoning, which conducts Go term knowledge based on pattern knowledge, to the opening game of Go. We discuss the implementation of neuro-fuzzy reasoning for deciding the best next move to proceed through the opening game. We also let neuro-fuzzy reasoning play against TD($\lambda$) learning to test the performance. The experimental result reveals that even the simple neuro-fuzzy reasoning model can compete against TD($\lambda$) learning and it shows great potential to be applied to the real game of Go.

  • PDF

Online Reinforcement Learning to Search the Shortest Path in Maze Environments (미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습)

  • Kim, Byeong-Cheon;Kim, Sam-Geun;Yun, Byeong-Ju
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.155-162
    • /
    • 2002
  • Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.

A Localized Adaptive QoS Routing using TD(${\lambda}$) method (TD(${\lambda}$) 기법을 사용한 지역적이며 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.5B
    • /
    • pp.304-309
    • /
    • 2005
  • In this paper, we propose a localized Adaptive QoS Routing using TD method and evaluate performance of various exploration methods when path is selected. Expecially, through extensive simulation, the proposed routing algorithm and exploration method using Exploration Bonus are shown to be effective in significantly reducing the overall blocking probability, when compared to the other path selection method(exploration method), because the proposed exploration method is more adaptive to network environments than others when path is selected.

Capacitated Fab Scheduling Approximation using Average Reward TD(${\lambda}$) Learning based on System Feature Functions (시스템 특성함수 기반 평균보상 TD(${\lambda}$) 학습을 통한 유한용량 Fab 스케줄링 근사화)

  • Choi, Jin-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.4
    • /
    • pp.189-196
    • /
    • 2011
  • In this paper, we propose a logical control-based actor-critic algorithm as an efficient approach for the approximation of the capacitated fab scheduling problem. We apply the average reward temporal-difference learning method for estimating the relative value functions of system states, while avoiding deadlock situation by Banker's algorithm. We consider the Intel mini-fab re-entrant line for the evaluation of the suggested algorithm and perform a numerical experiment by generating some sample system configurations randomly. We show that the suggested method has a prominent performance compared to other well-known heuristics.