• Title/Summary/Keyword: Q learning

Search Result 426, Processing Time 0.032 seconds

Path Planning with Obstacle Avoidance Based on Double Deep Q Networks (이중 심층 Q 네트워크 기반 장애물 회피 경로 계획)

  • Yongjiang Zhao;Senfeng Cen;Seung-Je Seong;J.G. Hur;Chang-Gyoon Lim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.231-240
    • /
    • 2023
  • It remains a challenge for robots to learn avoiding obstacles automatically in path planning using deep reinforcement learning (DRL). More and more researchers use DRL to train a robot in a simulated environment and verify the possibility of DRL to achieve automatic obstacle avoidance. Due to the influence factors of different environments robots and sensors, it is rare to realize automatic obstacle avoidance of robots in real scenarios. In order to learn automatic path planning by avoiding obstacles in the actual scene we designed a simple Testbed with the wall and the obstacle and had a camera on the robot. The robot's goal is to get from the start point to the end point without hitting the wall as soon as possible. For the robot to learn to avoid the wall and obstacle we propose to use the double deep Q networks (DDQN) to verify the possibility of DRL in automatic obstacle avoidance. In the experiment the robot used is Jetbot, and it can be applied to some robot task scenarios that require obstacle avoidance in automated path planning.

Strategy of Object Search for Distributed Autonomous Robotic Systems

  • Kim Ho-Duck;Yoon Han-Ul;Sim Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.3
    • /
    • pp.264-269
    • /
    • 2006
  • This paper presents the strategy for searching a hidden object in an unknown area for using by multiple distributed autonomous robotic systems (DARS). To search the target in Markovian space, DARS should recognize th ε ir surrounding at where they are located and generate some rules to act upon by themselves. First of all, DARS obtain 6-distances from itself to environment by infrared sensor which are hexagonally allocated around itself. Second, it calculates 6-areas with those distances then take an action, i.e., turn and move toward where the widest space will be guaranteed. After the action is taken, the value of Q will be updated by relative formula at the state. We set up an experimental environment with five small mobile robots, obstacles, and a target object, and tried to research for a target object while navigating in a un known hallway where some obstacles were placed. In the end of this paper, we present the results of three algorithms - a random search, an area-based action making process to determine the next action of the robot and hexagon-based Q-learning to enhance the area-based action making process.

Action Selection by Voting with Loaming Capability for a Behavior-based Control Approach (행동기반 제어방식을 위한 득점과 학습을 통한 행동선택기법)

  • Jeong, S.M.;Oh, S.R.;Yoon, D.Y.;You, B.J.;Chung, C.C.
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.163-168
    • /
    • 2002
  • The voting algorithm for action selection performs self-improvement by Reinforcement learning algorithm in the dynamic environment. The proposed voting algorithm improves the navigation of the robot by adapting the eligibility of the behaviors and determining the Command Set Generator (CGS). The Navigator that using a proposed voting algorithm corresponds to the CGS for giving the weight values and taking the reward values. It is necessary to decide which Command Set control the mobile robot at given time and to select among the candidate actions. The Command Set was learnt online by means as Q-learning. Action Selector compares Q-values of Navigator with Heterogeneous behaviors. Finally, real-world experimentation was carried out. Results show the good performance for the selection on command set as well as the convergence of Q-value.

  • PDF

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Q-learning for Adaptive LQ Suboptimal Control of Discrete-time Switched Linear System (이산 시간 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법)

  • Chun, Tae-Yoon;Choi, Yoon-Ho;Park, Jin-Bae
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.1874-1875
    • /
    • 2011
  • 본 논문에서는 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법 알고리즘을 제안한다. 제안된 제어 알고리즘은 안정성이 증명된 기존 Q-학습법에 기반하며 스위칭 시스템 모델의 변수를 모르는 상황에서도 준최적 제어가 가능하다. 이 알고리즘을 기반으로 기존에 스위칭 시스템에서 고려하지 않았던 각 시스템의 불확실성 및 최적 적응 제어 문제를 해결하고 컴퓨터 모의실험을 통해 제안한 알고리즘의 성능과 결과를 검증한다.

  • PDF

Radial Basis Function Neural Networks (RBFNN) and p-q Power Theory Based Harmonic Identification in Converter Waveforms

  • Almaita, Eyad K.;Asumadu, Johnson A.
    • Journal of Power Electronics
    • /
    • v.11 no.6
    • /
    • pp.922-930
    • /
    • 2011
  • In this paper, two radial basis function neural networks (RBFNNs) are used to dynamically identify harmonics content in converter waveforms based on the p-q (real power-imaginary power) theory. The converter waveforms are analyzed and the types of harmonic content are identified over a wide operating range. Constant power and sinusoidal current compensation strategies are investigated in this paper. The RBFNN filtering training algorithm is based on a systematic and computationally efficient training method called the hybrid learning method. In this new methodology, the RBFNN is combined with the p-q theory to extract the harmonics content in converter waveforms. The small size and the robustness of the resulting network models reflect the effectiveness of the algorithm. The analysis is verified using MATLAB simulations.

Research of Foresight Knowledge by CMAC based Q-learning in Inhomogeneous Multi-Agent System

  • Hoshino, Yukinobu;Sakakura, Akira;Kamei, Katsuari
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.280-283
    • /
    • 2003
  • A purpose of our research is an acquisition of cooperative behaviors in inhomogeneous multi-agent system. In this research, we used the fire panic problem as an experiment environment. In Fire panic problem a fire exists in the environment, and follows in each steps of agent's behavior, and this fire spreads within the constant law. The purpose of the agent is to reach the goal established without touching the fire, which exists in the environment. The fire heat up by a few steps, which exists in the environment. The fire has unsureness to the agent. The agent has to avoid a fire, which is spreading in environment. The acquisition of the behavior to reach it to the goal is required. In this paper, we observe how agents escape from the fire cooperating with other agents. For this problem, we propose a unique CMAC based Q-learning system for inhomogeneous multi-agent system.

  • PDF

Development of reinforcement learning algorithm with countinuous action selection for acrobot (Acrobot 제어를 위한 강화학습에서의 연속적인 행위 선택 알고리즘의 개발)

  • Seo, Sung-Hwan;Jang, Si-Young;Suh, Il-Hong
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2387-2389
    • /
    • 2003
  • Acrobat은 대표석인 비선형, underactuated 시스템이며, acrobot의 제어목적에는 swing-up 제어와 balancing 제어가 있다. 이 두 가지 제어목적을 달성하기 위해 기존에 많은 연구가 진행되었다. 그러나 이 방법들은 두 개의 독립적인 제어기를 acrobot의 상태에 따라 전환하여 사용하는 방법으로서 전환 시점의 선정기준에 대한 어려움과 두 가지 제어목적의 달성을 위한 전체 학습 시간지연의 문제점이 있다. 이를 개선하기 위하여 우리는 acrobot의 두 가지 제어목적을 동시에 해결할 수 있도록 기존에 연구하였던 연속적인 상태공간의 근사화가 가능한 영역기반 Q-학습(Region-based Q-Learning)[11]을 기반으로 한 하나의 제어기로 구현하는 방법을 연구하였다. 제안한 방법을 제작한 acrobot에 적용한 실험을 통하여 그 유용성을 검증하였다.

  • PDF

Design and Development of m-Learning Service Based on 3G Cellular Phones

  • Chung, Kwang-Sik;Lee, Jeong-Eun
    • Journal of Information Processing Systems
    • /
    • v.8 no.3
    • /
    • pp.521-538
    • /
    • 2012
  • As the knowledge society matures, not only distant, but also off-line universities are trying to provide learners with on-line educational contents. Particularly, high effectiveness of mobile devices for e-Learning has been demonstrated by the university sector, which uses distant learning that is based on blended learning. In this paper, we analyzed previous m-Learning scenarios and future technology prospects. Based on the proposed m-Learning scenario, we designed cellular phone-based educational contents and service structure, implemented m-Learning system, and analyzed m-Learning service satisfaction. The design principles of the m-Learning service are 1) to provide learners with m-Learning environment with both cellular phones and desktop computers; 2) to serve announcements, discussion boards, Q&A boards, course materials, and exercises on cellular phones and desktop computers; and 3) to serve learning activities like the reviewing of full lectures, discussions, and writing term papers using desktop computers and cellular phones. The m-Learning service was developed on a cellular phone that supports H.264 codex in 3G communication technology. Some of the functions of the m-Learning design principles are implemented in a 3G cellular phone. The contents of lectures are provided in the forms of video, text, audio, and video with text. One-way educational contents are complemented by exercises (quizzes).

L-CAA : An Architecture for Behavior-Based Reinforcement Learning (L-CAA : 행위 기반 강화학습 에이전트 구조)

  • Hwang, Jong-Geun;Kim, In-Cheol
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.3
    • /
    • pp.59-76
    • /
    • 2008
  • In this paper, we propose an agent architecture called L-CAA that is quite effective in real-time dynamic environments. L-CAA is an extension of CAA, the behavior-based agent architecture which was also developed by our research group. In order to improve adaptability to the changing environment, it is extended by adding reinforcement learning capability. To obtain stable performance, however, behavior selection and execution in the L-CAA architecture do not entirely rely on learning. In L-CAA, learning is utilized merely as a complimentary means for behavior selection and execution. Behavior selection mechanism in this architecture consists of two phases. In the first phase, the behaviors are extracted from the behavior library by checking the user-defined applicable conditions and utility of each behavior. If multiple behaviors are extracted in the first phase, the single behavior is selected to execute in the help of reinforcement learning in the second phase. That is, the behavior with the highest expected reward is selected by comparing Q values of individual behaviors updated through reinforcement learning. L-CAA can monitor the maintainable conditions of the executing behavior and stop immediately the behavior when some of the conditions fail due to dynamic change of the environment. Additionally, L-CAA can suspend and then resume the current behavior whenever it encounters a higher utility behavior. In order to analyze effectiveness of the L-CAA architecture, we implement an L-CAA-enabled agent autonomously playing in an Unreal Tournament game that is a well-known dynamic virtual environment, and then conduct several experiments using it.

  • PDF