Search | Korea Science

Lee, Chang-Hoon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.10 no.5
- /
- pp.265-270
- /
- 2010
Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like TD-learning and TD(${\lambda}$)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present GDRLS algorithm for finding the shortest path faster in a maze environment. GDRLS is select the candidate states that can guide the shortest path in maze environment, and learn only the candidate states to find the shortest path. Through experiments, we can see that GDRLS can search the shortest path faster than TD-learning and TD(${\lambda}$)-learning in maze environment.
PDF KSCI

Lee, Byung-Doo
- Journal of Korea Game Society
- /
- v.9 no.6
- /
- pp.117-125
- /
- 2009
This paper describes the result of applying neuro-fuzzy reasoning, which conducts Go term knowledge based on pattern knowledge, to the opening game of Go. We discuss the implementation of neuro-fuzzy reasoning for deciding the best next move to proceed through the opening game. We also let neuro-fuzzy reasoning play against TD($\lambda$) learning to test the performance. The experimental result reveals that even the simple neuro-fuzzy reasoning model can compete against TD($\lambda$) learning and it shows great potential to be applied to the real game of Go.
PDF

Kim, Byeong-Cheon;Kim, Sam-Geun;Yun, Byeong-Ju
- The KIPS Transactions:PartB
- /
- v.9B no.2
- /
- pp.155-162
- /
- 2002
Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.
https://doi.org/10.3745/KIPSTB.2002.9B.2.155 인용 PDF KSCI

Han Jeong-Soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.5B
- /
- pp.304-309
- /
- 2005
In this paper, we propose a localized Adaptive QoS Routing using TD method and evaluate performance of various exploration methods when path is selected. Expecially, through extensive simulation, the proposed routing algorithm and exploration method using Exploration Bonus are shown to be effective in significantly reducing the overall blocking probability, when compared to the other path selection method(exploration method), because the proposed exploration method is more adaptive to network environments than others when path is selected.
PDF KSCI

Choi, Jin-Young
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.34 no.4
- /
- pp.189-196
- /
- 2011
In this paper, we propose a logical control-based actor-critic algorithm as an efficient approach for the approximation of the capacitated fab scheduling problem. We apply the average reward temporal-difference learning method for estimating the relative value functions of system states, while avoiding deadlock situation by Banker's algorithm. We consider the Intel mini-fab re-entrant line for the evaluation of the suggested algorithm and perform a numerical experiment by generating some sample system configurations randomly. We show that the suggested method has a prominent performance compared to other well-known heuristics.
PDF KSCI