• Title/Summary/Keyword: Q learning

Search Result 426, Processing Time 0.026 seconds

Behavior Strategies of Robot Soccer Agent by Reinforcement Learning (강화 학습에 의한 로봇축구 에이전트 행동 전략)

  • Choe, So-Ra;Lee, Seung-Gwan;Lee, Young-Ah;Chung, Tae-Choong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.11a
    • /
    • pp.465-468
    • /
    • 2005
  • 강화 학습이란 개체가 동적인 환경에서 시행착오를 통해 자신의 최적 행동을 찾아내는 기법이다. 특히 Q-learning과 같은 비(非)모델 기반의 강화학습은 사전에 환경에 대한 모델을 필요로 하지 않으며, 다양한 상태와 행동들을 충분히 경험한다면 최적의 행동 전략에 도달할 수 있으므로 여러 분야에 적용되고 있다. 본 논문에서는 로봇의 행동을 효율적으로 제어하기 위하여 Q-learning을 이용하였다. 로봇 축구 시스템은 공과 여러 대의 로봇이 시시각각 움직이는 시변 환경이므로 모델링이 상당히 복잡하다. 공을 골대 가까이 보내는 것이 로봇 축구의 목표지만 때로는 공을 무조건 골대 방향으로 보내는 것보다 더 효율적인 전략이 있을 수도 있다. 어떤 상황에서 어떤 행동을 하여야 장기적으로 보았을 때 더 우수한지 학습을 통해 로봇 스스로가 판단해가도록 시스템을 구현하고, 학습된 결과를 분석한다.

  • PDF

Development of Optimal Design Technique of RC Beam using Multi-Agent Reinforcement Learning (다중 에이전트 강화학습을 이용한 RC보 최적설계 기술개발)

  • Kang, Joo-Won;Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.23 no.2
    • /
    • pp.29-36
    • /
    • 2023
  • Reinforcement learning (RL) is widely applied to various engineering fields. Especially, RL has shown successful performance for control problems, such as vehicles, robotics, and active structural control system. However, little research on application of RL to optimal structural design has conducted to date. In this study, the possibility of application of RL to structural design of reinforced concrete (RC) beam was investigated. The example of RC beam structural design problem introduced in previous study was used for comparative study. Deep q-network (DQN) is a famous RL algorithm presenting good performance in the discrete action space and thus it was used in this study. The action of DQN agent is required to represent design variables of RC beam. However, the number of design variables of RC beam is too many to represent by the action of conventional DQN. To solve this problem, multi-agent DQN was used in this study. For more effective reinforcement learning process, DDQN (Double Q-Learning) that is an advanced version of a conventional DQN was employed. The multi-agent of DDQN was trained for optimal structural design of RC beam to satisfy American Concrete Institute (318) without any hand-labeled dataset. Five agents of DDQN provides actions for beam with, beam depth, main rebar size, number of main rebar, and shear stirrup size, respectively. Five agents of DDQN were trained for 10,000 episodes and the performance of the multi-agent of DDQN was evaluated with 100 test design cases. This study shows that the multi-agent DDQN algorithm can provide successfully structural design results of RC beam.

Bi-directional Electricity Negotiation Scheme based on Deep Reinforcement Learning Algorithm in Smart Building Systems (스마트 빌딩 시스템을 위한 심층 강화학습 기반 양방향 전력거래 협상 기법)

  • Lee, Donggu;Lee, Jiyoung;Kyeong, Chanuk;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.215-219
    • /
    • 2021
  • In this paper, we propose a deep reinforcement learning algorithm-based bi-directional electricity negotiation scheme that adjusts and propose the price they want to exchange for negotiation over smart building and utility grid. By employing a deep Q network algorithm, which is a kind of deep reinforcement learning algorithm, the proposed scheme adjusts the price proposal of smart building and utility grid. From the simulation results, it can be verified that consensus on electricity price negotiation requires average of 43.78 negotiation process. The negotiation process under simulation settings and scenario can also be confirmed through the simulation results.

Simple Q-learning using heuristic strategies (휴리스틱 전략을 이용한 Q러닝의 학습 간단화)

  • Park, Jong-cheol;Kim, Hyeon-cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.708-710
    • /
    • 2018
  • 강화학습은 게임의 인공지능을 대체할 수 있는 수단이지만 불완전한 게임에서 학습하기 힘들다. 학습하기 복잡한 불완전안 카드게임에서 휴리스틱한 전략을 만들고 비슷한 상태끼리 묶으면서 학습의 복잡성을 낮추었다. 인공신경망 없이 Q-러닝만으로 게임을 5만판을 통해서 상태에 따른 전략을 학습하였다. 그 결과 동일한 전략만을 사용하는 대결보다 승률이 높게 나왔고, 다양한 상태에서 다른 전략을 선택하는 것을 관찰하였다.

An Adaptive Scheduling Algorithm for Manufacturing Process with Non-stationary Rework Probabilities (비안정적인 Rework 확률이 존재하는 제조공정을 위한 적응형 스케줄링 알고리즘)

  • Shin, Hyun-Joon;Ru, Jae-Pil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.11
    • /
    • pp.4174-4181
    • /
    • 2010
  • This paper presents an adaptive scheduling algorithm for manufacturing processes with non-stationary rework probabilities. The adaptive scheduling scheme named by hybrid Q-learning algorithm is proposed in this paper making use of the non-stationary rework probability and coupling with artificial neural networks. The proposed algorithm is measured by mean tardiness and the extensive computational results show that the presented algorithm gives very efficient schedules superior to the existing dispatching algorithms.

Types of students' attitudes toward non-face-to-face classes in universities caused by Covid-19: Focusing on the Q methodological approach (코비드-19로 인한 대학의 비대면 수업에 대한 학생들의 태도 유형: Q 방법론적 접근을 중심으로)

  • Choi, Wonjoo;Seo, Sangho
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.223-231
    • /
    • 2022
  • Covid-19, which has made a huge difference in our daily lives, has also brought major changes to our college education. As the class was changed from the traditional face-to-face class to a non face-to-face class, both teachers and students had difficulties in adapting, and problems such as the occurrence of academic achievement gaps due to non face-to-face classes were also raised. Therefore, this study aims to find out what attitudes students have toward non-face-to-face classes at universities caused by Covid-19. Accordingly, this study tried to identify the types of subjective perceptions college students have toward non-face-to-face classes by applying the Q methodology, and to suggest points for reference in the development and improvement of non-face-to-face classes in the future. Five types were found as a result of analysis using 30 P samples and 34 Q samples. First, learning efficiency-oriented type, second, class participation and communication-oriented type, third, non-face-to-face class active acceptance and utilization type, fourth, dissatisfaction type due to remote system and equipment operation errors, fifth, passive response type according to the situation to be. From the results of this study, it seems that it is necessary to develop an educational method for effective non-face-to-face class considering the characteristics of each type, and the merits of non-face-to-face classes, especially recorded lectures, in terms of learning efficiency, are evident. Therefore, even if face-to-face classes are conducted entirely at universities, it is believed that providing video-recorded lectures in class will be of great help to students' learning.

Robust tuning of quadratic criterion-based iterative learning control for linear batch system

  • Kim, Won-Cheol;Lee, Kwang-Soon
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10a
    • /
    • pp.303-306
    • /
    • 1996
  • We propose a robust tuning method of the quadratic criterion based iterative learning control(Q-ILC) algorithm for discrete-time linear batch system. First, we establish the frequency domain representation for batch systems. Next, a robust convergence condition is derived in the frequency domain. Based on this condition, we propose to optimize the weighting matrices such that the upper bound of the robustness measure is minimized. Through numerical simulation, it is shown that the designed learning filter restores robustness under significant model uncertainty.

  • PDF

Strategy of Reinforcement Learning in Artificial Life (인공생명의 연구에 있어서 강화학습의 전략)

  • 심귀보;박창현
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.257-260
    • /
    • 2001
  • 일반적으로 기계학습은 교사신호의 유무에 따라 교사학습과 비교사학습, 그리고 간접교사에 의한 강화학습으로 분류할 수 있다. 강화학습이란 용어는 원래 실험 심리학에서 동물의 학습방법 연구에서 비롯되었으나, 최근에는 공학 특히 인공생명분야에서 뉴럴 네트워크의 학습 알고리즘으로 많은 관심을 끌고 있다. 강화학습은 제어기 또는 에이전트의 행동에 대한 보상을 최대화하는 상태-행동 규칙이나 행동발생 전략을 찾아내는 것이다. 본 논문에서는 최근 많이 연구되고 있는 강화학습의 방법과 연구동향을 소개하고, 특히 인공생명 연구에 있어서 강하학습의 중요성을 역설한다.

  • PDF

A Naive Bayesian-based Model of the Opponent's Policy for Efficient Multiagent Reinforcement Learning (효율적인 멀티 에이전트 강화 학습을 위한 나이브 베이지만 기반 상대 정책 모델)

  • Kwon, Ki-Duk
    • Journal of Internet Computing and Services
    • /
    • v.9 no.6
    • /
    • pp.165-177
    • /
    • 2008
  • An important issue in Multiagent reinforcement learning is how an agent should learn its optimal policy in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for Multiagent reinforcement learning tend to apply single-agent reinforcement learning techniques without any extensions or require some unrealistic assumptions even though they use explicit models of other agents. In this paper, a Naive Bayesian based policy model of the opponent agent is introduced and then the Multiagent reinforcement learning method using this model is explained. Unlike previous works, the proposed Multiagent reinforcement learning method utilizes the Naive Bayesian based policy model, not the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper, the Cat and Mouse game is introduced as an adversarial Multiagent environment. And then effectiveness of the proposed Naive Bayesian based policy model is analyzed through experiments using this game as test-bed.

  • PDF

Optimal Scheduling of Satellite Tracking Antenna of GNSS System (다중위성 추적 안테나의 위성추적 최적 스케쥴링)

  • Ahn, Chae-Ik;Shin, Ho-Hyun;Kim, You-Dan;Jung, Seong-Kyun;Lee, Sang-Uk;Kim, Jae-Hoon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.36 no.7
    • /
    • pp.666-673
    • /
    • 2008
  • To construct the accurate radio satellite navigation system, the efficient communication each satellite with the ground station is very important. Throughout the communication, the orbit of each satellite can be corrected, and those information will be used to analyze the satellite satus by the operator. Since there are limited resources of ground station, the schedule of antenna's azimuth and elevation angle should be optimized. On the other hand, the satellite in the medium earth orbit does not pass the same point of the earth surface due to the rotation of the earth. Therefore, the antenna pass schedule must be updated at the proper moment. In this study, Q learning approach which is a form of model-free reinforcement learning and genetic algorithm are considered to find the optimal antenna schedule. To verify the optimality of the solution, numerical simulations are conducted.