• Title/Summary/Keyword: reinforcement teaming

Search Result 20, Processing Time 0.019 seconds

Reinforcement learning Speedup method using Q-value Initialization (Q-value Initialization을 이용한 Reinforcement Learning Speedup Method)

  • 최정환
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.13-16
    • /
    • 2001
  • In reinforcement teaming, Q-learning converges quite slowly to a good policy. Its because searching for the goal state takes very long time in a large stochastic domain. So I propose the speedup method using the Q-value initialization for model-free reinforcement learning. In the speedup method, it learns a naive model of a domain and makes boundaries around the goal state. By using these boundaries, it assigns the initial Q-values to the state-action pairs and does Q-learning with the initial Q-values. The initial Q-values guide the agent to the goal state in the early states of learning, so that Q-teaming updates Q-values efficiently. Therefore it saves exploration time to search for the goal state and has better performance than Q-learning. 1 present Speedup Q-learning algorithm to implement the speedup method. This algorithm is evaluated. in a grid-world domain and compared to Q-teaming.

  • PDF

A Function Approximation Method for Q-learning of Reinforcement Learning (강화학습의 Q-learning을 위한 함수근사 방법)

  • 이영아;정태충
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1431-1438
    • /
    • 2004
  • Reinforcement learning learns policies for accomplishing a task's goal by experience through interaction between agent and environment. Q-learning, basis algorithm of reinforcement learning, has the problem of curse of dimensionality and slow learning speed in the incipient stage of learning. In order to solve the problems of Q-learning, new function approximation methods suitable for reinforcement learning should be studied. In this paper, to improve these problems, we suggest Fuzzy Q-Map algorithm that is based on online fuzzy clustering. Fuzzy Q-Map is a function approximation method suitable to reinforcement learning that can do on-line teaming and express uncertainty of environment. We made an experiment on the mountain car problem with fuzzy Q-Map, and its results show that learning speed is accelerated in the incipient stage of learning.

Online Reinforcement Learning to Search the Shortest Path in Maze Environments (미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습)

  • Kim, Byeong-Cheon;Kim, Sam-Geun;Yun, Byeong-Ju
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.155-162
    • /
    • 2002
  • Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.

Credit-Assigned-CMAC-based Reinforcement Learn ing with Application to the Acrobot Swing Up Control Problem (Acrobot Swing Up Control을 위한 Credit-Assigned-CMAC-based 강화학습)

  • 장시영;신연용;서승환;서일홍
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.7
    • /
    • pp.517-524
    • /
    • 2004
  • For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement teaming method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC- based Reinforcement Learning), computer simulation and experiment results are illustrated, where a swing-up control Problem of an acrobot is considered.

Adaptive Learning Control of Neural Network Using Real-Time Evolutionary Algorithm (실시간 진화 알고리듬을 통한 신경망의 적응 학습제어)

  • Chang, Sung-Ouk;Lee, Jin-Kul
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.6
    • /
    • pp.1092-1098
    • /
    • 2002
  • This paper discusses the composition of the theory of reinforcement teaming, which is applied in real-time teaming, and evolutionary strategy, which proves its the superiority in the finding of the optimal solution at the off-line teaming method. The individuals are reduced in order to team the evolutionary strategy in real-time, and new method that guarantee the convergence of evolutionary mutations are proposed. It is possible to control the control object varied as time changes. As the state value of the control object is generated, applied evolutionary strategy each sampling time because of the teaming process of an estimation, selection, mutation in real-time. These algorithms can be applied, the people who do not have knowledge about the technical tuning of dynamic systems could design the controller or problems in which the characteristics of the system dynamics are slightly varied as time changes. In the future, studies are needed on the proof of the theory through experiments and the characteristic considerations of the robustness against the outside disturbances.

Tunnel Ventilation Controller Design Employing RLS-Based Natural Actor-Critic Algorithm (RLS 기반의 Natural Actor-Critic 알고리즘을 이용한 터널 환기제어기 설계)

  • Chu B.;Kim D.;Hong D.;Park J.;Chung J.T.;Kim T.H.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.53-54
    • /
    • 2006
  • The main purpose of tunnel ventilation system is to maintain CO pollutant and VI (visibility index) under an adequate level to provide drivers with safe driving condition. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement teaming (RL) method. RL is a goal-directed teaming of a mapping from situations to actions. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. Constructing the reward of the tunnel ventilation system, two objectives listed above are included. RL algorithm based on actor-critic architecture and natural gradient method is adopted to the system. Also, the recursive least-squares (RLS) is employed to the learning process to improve the efficiency of the use of data. The simulation results performed with real data collected from existing tunnel are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  • PDF

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

  • 박찬건;양성봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.672-680
    • /
    • 2003
  • A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.

FLNN-Based Friction Compensation Controller for XY Tables (FLNN에 기초한 XY Table용 마찰 보상 제어기)

  • Chung, Chae-Wook;Kim, Young-Ho;Kuc, Tae-Yong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.8 no.2
    • /
    • pp.113-119
    • /
    • 2002
  • An FLNN-based neural network controller is applied to precise positioning of XY table with friction as the extension study of [11]. The neural network identifies the frictional farces of the table. Its weight adaptation rule, named the reinforcement adaptive learning rule, is derived from the Lyapunov stability theory. The experimental results with 2-DOF XY table verify the effectiveness of the proposed control scheme. It is also expected that the proposed control approach is applicable to a wide class of mechanical systems.

Reinforcement Learning Approach to Agents Dynamic Positioning in Robot Soccer Simulation Games

  • Kwon, Ki-Duk;Kim, In-Cheol
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.10a
    • /
    • pp.321-324
    • /
    • 2001
  • The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent's dynamic positioning in such dynamic environment. Reinforcement Beaming is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to choose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement loaming is different from supervised teaming in the sense that there is no presentation of input-output pairs as training examples. Furthermore, model-free reinforcement loaming algorithms like Q-learning do not require defining or loaming any models of the surrounding environment. Nevertheless it can learn the optimal policy if the agent can visit every state-action pair infinitely. However, the biggest problem of monolithic reinforcement learning is that its straightforward applications do not successfully scale up to more complex environments due to the intractable large space of states. In order to address this problem, we suggest Adaptive Mediation-based Modular Q-Learning(AMMQL) as an improvement of the existing Modular Q-Learning(MQL). While simple modular Q-learning combines the results from each learning module in a fixed way, AMMQL combines them in a more flexible way by assigning different weight to each module according to its contribution to rewards. Therefore in addition to resolving the problem of large state space effectively, AMMQL can show higher adaptability to environmental changes than pure MQL. This paper introduces the concept of AMMQL and presents details of its application into dynamic positioning of robot soccer agents.

  • PDF

(e-commerce Agents using Reinforcement Learning) (강화 학습을 이용한 전자 상거래 에이전트)

  • 윤지현;김일곤
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.579-586
    • /
    • 2003
  • Agents are well fitted to e-commerce applicable area because they pursuit an autonomy and interact with dynamic environment. In this paper we propose an e-commerce agents using reinforcement learning. We modify a reinforcement teaming algorithm for agents to have an intelligent feature and to make a transaction as practical business body in behalf of a person. To show the validity of this approach, we classify agents into buying agents and soiling agents, give characters of level according to the degree of learning and communication. Finally we implement an e-commerce framework and show the result. In this paper we show a design of e-commerce agents which is based on the proposed learning algorithm and present that the agents have enough possibility of doing a transaction in practical e-commerce.