• 제목/요약/키워드: Model based reinforcement learning

검색결과 155건 처리시간 0.024초

메타강화학습을 이용한 수중로봇 매니퓰레이터 제어 (Control for Manipulator of an Underwater Robot Using Meta Reinforcement Learning)

  • 문지윤;문장혁;배성훈
    • 한국전자통신학회논문지
    • /
    • 제16권1호
    • /
    • pp.95-100
    • /
    • 2021
  • 본 논문에서는 수중 건설 로봇을 제어하기 위한 모델 기반 메타 강화 학습 방법을 제안한다. 모델 기반 메타 강화 학습은 실제 응용 프로그램의 최근 경험을 사용하여 모델을 빠르게 업데이트한다. 다음으로, 대상 위치에 도달하기 위해 매니퓰레이터의 제어 입력을 계산하는 모델 예측 제어로 모델을 전송한다. MuJoCo 및 Gazebo를 사용하여 모델 기반 메타 강화 학습을 위한 시뮬레이션 환경을 구축하였으며 수중 건설 로봇의 실제 제어 환경에서의 모델 불확실성을 포함하여 제안한 방법을 검증하였다.

효율적인 멀티 에이전트 강화 학습을 위한 나이브 베이지만 기반 상대 정책 모델 (A Naive Bayesian-based Model of the Opponent's Policy for Efficient Multiagent Reinforcement Learning)

  • 권기덕
    • 인터넷정보학회논문지
    • /
    • 제9권6호
    • /
    • pp.165-177
    • /
    • 2008
  • 멀티 에이전트 강화학습에서 중요한 이슈 중의 하나는 자신의 성능에 영향을 미칠 수 있는 다른 에이전트들이 존재하는 동적 환경에서 어떻게 최적의 행동 정책을 학습하느냐 하는 것이다. 멀티 에이전트 강화 학습을 위한 기존 연구들은 대부분 단일 에이전트 강화 학습기법들을 큰 변화 없이 그대로 적용하거나 비록 다른 에이전트에 관한 별도의 모델을 이용하더라도 현실적이지 못한 가정들을 요구한다. 본 논문에서는 상대 에이전트에 대한 나이브 베이지안 기반의 행동 정책 모델을 소개한 뒤, 이것을 이용한 강화 학습 방법을 설명한다. 본 논문에서 제안하는 멀티 에이전트 강화학습 방법은 기존의 멀티 에이전트 강화 학습 연구들과는 달리 상대 에이전트의 Q 평가 함수 모델이 아니라 나이브 베이지안 기반의 행동 정책 모델을 학습한다. 또한, 표현력은 풍부하나 학습에 시간과 노력이 많이 요구되는 유한 상태 오토마타나 마코프 체인과 같은 행동 정책 모델들에 비해 비교적 간단한 형태의 행동 정책 모델은 이용함으로써 학습의 효율성을 높였다. 본 논문에서는 대표적인 적대적 멀티 에이전트 환경인 고양이와 쥐게임을 소개한 뒤, 이 게임을 테스트 베드 삼아 실험들을 전개함으로써 제안하는 나이브 베이지안 기반의 정책 모델의 효과를 분석해본다.

  • PDF

커리큘럼을 이용한 투서클 기반 항공기 헤드온 공중 교전 강화학습 기법 연구 (Two Circle-based Aircraft Head-on Reinforcement Learning Technique using Curriculum)

  • 황인수;배정호
    • 한국군사과학기술학회지
    • /
    • 제26권4호
    • /
    • pp.352-360
    • /
    • 2023
  • Recently, AI pilots using reinforcement learning are developing to a level that is more flexible than rule-based methods and can replace human pilots. In this paper, a curriculum was used to help head-on combat with reinforcement learning. It is not easy to learn head-on with a reinforcement learning method without a curriculum, but in this paper, through the two circle-based head-on air combat learning technique, ownship gradually increase the difficulty and become good at head-on combat. On the two-circle, the ATA angle between the ownship and target gradually increased and the AA angle gradually decreased while learning was conducted. By performing reinforcement learning with and w/o curriculum, it was engaged with the rule-based model. And as the win ratio of the curriculum based model increased to close to 100 %, it was confirmed that the performance was superior.

강화학습을 이용한 진화 알고리즘의 성능개선에 대한 연구 (A Study on Performance Improvement of Evolutionary Algorithms Using Reinforcement Learning)

  • 이상환;심귀보
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1998년도 추계학술대회 학술발표 논문집
    • /
    • pp.420-426
    • /
    • 1998
  • Evolutionary algorithms are probabilistic optimization algorithms based on the model of natural evolution. Recently the efforts to improve the performance of evolutionary algorithms have been made extensively. In this paper, we introduce the research for improving the convergence rate and search faculty of evolution algorithms by using reinforcement learning. After providing an introduction to evolution algorithms and reinforcement learning, we present adaptive genetic algorithms, reinforcement genetic programming, and reinforcement evolution strategies which are combined with reinforcement learning. Adaptive genetic algorithms generate mutation probabilities of each locus by interacting with the environment according to reinforcement learning. Reinforcement genetic programming executes crossover and mutation operations based on reinforcement and inhibition mechanism of reinforcement learning. Reinforcement evolution strategies use the variances of fitness occurred by mutation to make the reinforcement signals which estimate and control the step length.

  • PDF

Acrobot Swing Up 제어를 위한 Credit-Assigned-CMAC 기반의 강화학습 (Credit-Assigned-CMAC-based Reinforcement Learning with application to the Acrobot Swing Up Control Problem)

  • 신연용;장시영;서승환;서일홍
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 B
    • /
    • pp.621-624
    • /
    • 2003
  • For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement learning method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC-based Reinforcement Learning), computer simulation results are illustrated, where a swing-up control problem of an acrobot is considered.

  • PDF

Aspect-based Sentiment Analysis of Product Reviews using Multi-agent Deep Reinforcement Learning

  • M. Sivakumar;Srinivasulu Reddy Uyyala
    • Asia pacific journal of information systems
    • /
    • 제32권2호
    • /
    • pp.226-248
    • /
    • 2022
  • The existing model for sentiment analysis of product reviews learned from past data and new data was labeled based on training. But new data was never used by the existing system for making a decision. The proposed Aspect-based multi-agent Deep Reinforcement learning Sentiment Analysis (ADRSA) model learned from its very first data without the help of any training dataset and labeled a sentence with aspect category and sentiment polarity. It keeps on learning from the new data and updates its knowledge for improving its intelligence. The decision of the proposed system changed over time based on the new data. So, the accuracy of the sentiment analysis using deep reinforcement learning was improved over supervised learning and unsupervised learning methods. Hence, the sentiments of premium customers on a particular site can be explored to other customers effectively. A dynamic environment with a strong knowledge base can help the system to remember the sentences and usage State Action Reward State Action (SARSA) algorithm with Bidirectional Encoder Representations from Transformers (BERT) model improved the performance of the proposed system in terms of accuracy when compared to the state of art methods.

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • 제54권9호
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

A Joint Allocation Algorithm of Computing and Communication Resources Based on Reinforcement Learning in MEC System

  • Liu, Qinghua;Li, Qingping
    • Journal of Information Processing Systems
    • /
    • 제17권4호
    • /
    • pp.721-736
    • /
    • 2021
  • For the mobile edge computing (MEC) system supporting dense network, a joint allocation algorithm of computing and communication resources based on reinforcement learning is proposed. The energy consumption of task execution is defined as the maximum energy consumption of each user's task execution in the system. Considering the constraints of task unloading, power allocation, transmission rate and calculation resource allocation, the problem of joint task unloading and resource allocation is modeled as a problem of maximum task execution energy consumption minimization. As a mixed integer nonlinear programming problem, it is difficult to be directly solve by traditional optimization methods. This paper uses reinforcement learning algorithm to solve this problem. Then, the Markov decision-making process and the theoretical basis of reinforcement learning are introduced to provide a theoretical basis for the algorithm simulation experiment. Based on the algorithm of reinforcement learning and joint allocation of communication resources, the joint optimization of data task unloading and power control strategy is carried out for each terminal device, and the local computing model and task unloading model are built. The simulation results show that the total task computation cost of the proposed algorithm is 5%-10% less than that of the two comparison algorithms under the same task input. At the same time, the total task computation cost of the proposed algorithm is more than 5% less than that of the two new comparison algorithms.

Acrobot Swing Up Control을 위한 Credit-Assigned-CMAC-based 강화학습 (Credit-Assigned-CMAC-based Reinforcement Learn ing with Application to the Acrobot Swing Up Control Problem)

  • 장시영;신연용;서승환;서일홍
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제53권7호
    • /
    • pp.517-524
    • /
    • 2004
  • For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement teaming method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC- based Reinforcement Learning), computer simulation and experiment results are illustrated, where a swing-up control Problem of an acrobot is considered.