• 제목/요약/키워드: Path of Reinforcement

검색결과 134건 처리시간 0.024초

시뮬레이션 환경에서의 DQN을 이용한 강화 학습 기반의 무인항공기 경로 계획 (Path Planning of Unmanned Aerial Vehicle based Reinforcement Learning using Deep Q Network under Simulated Environment)

  • 이근형;김신덕
    • 반도체디스플레이기술학회지
    • /
    • 제16권3호
    • /
    • pp.127-130
    • /
    • 2017
  • In this research, we present a path planning method for an autonomous flight of unmanned aerial vehicles (UAVs) through reinforcement learning under simulated environment. We design the simulator for reinforcement learning of uav. Also we implement interface for compatibility of Deep Q-Network(DQN) and simulator. In this paper, we perform reinforcement learning through the simulator and DQN, and use Q-learning algorithm, which is a kind of reinforcement learning algorithms. Through experimentation, we verify performance of DQN-simulator. Finally, we evaluated the learning results and suggest path planning strategy using reinforcement learning.

  • PDF

Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning

  • Park, Jung-Jun;Kim, Ji-Hun;Song, Jae-Bok
    • International Journal of Control, Automation, and Systems
    • /
    • 제5권6호
    • /
    • pp.674-680
    • /
    • 2007
  • The probabilistic roadmap (PRM) method, which is a popular path planning scheme, for a manipulator, can find a collision-free path by connecting the start and goal poses through a roadmap constructed by drawing random nodes in the free configuration space. PRM exhibits robust performance for static environments, but its performance is poor for dynamic environments. On the other hand, reinforcement learning, a behavior-based control technique, can deal with uncertainties in the environment. The reinforcement learning agent can establish a policy that maximizes the sum of rewards by selecting the optimal actions in any state through iterative interactions with the environment. In this paper, we propose efficient real-time path planning by combining PRM and reinforcement learning to deal with uncertain dynamic environments and similar environments. A series of experiments demonstrate that the proposed hybrid path planner can generate a collision-free path even for dynamic environments in which objects block the pre-planned global path. It is also shown that the hybrid path planner can adapt to the similar, previously learned environments without significant additional learning.

도시환경 매핑 시 SLAM 불확실성 최소화를 위한 강화 학습 기반 경로 계획법 (RL-based Path Planning for SLAM Uncertainty Minimization in Urban Mapping)

  • 조영훈;김아영
    • 로봇학회논문지
    • /
    • 제16권2호
    • /
    • pp.122-129
    • /
    • 2021
  • For the Simultaneous Localization and Mapping (SLAM) problem, a different path results in different SLAM results. Usually, SLAM follows a trail of input data. Active SLAM, which determines where to sense for the next step, can suggest a better path for a better SLAM result during the data acquisition step. In this paper, we will use reinforcement learning to find where to perceive. By assigning entire target area coverage to a goal and uncertainty as a negative reward, the reinforcement learning network finds an optimal path to minimize trajectory uncertainty and maximize map coverage. However, most active SLAM researches are performed in indoor or aerial environments where robots can move in every direction. In the urban environment, vehicles only can move following road structure and traffic rules. Graph structure can efficiently express road environment, considering crossroads and streets as nodes and edges, respectively. In this paper, we propose a novel method to find optimal SLAM path using graph structure and reinforcement learning technique.

상태 공간 압축을 이용한 강화학습 (Reinforcement Learning Using State Space Compression)

  • 김병천;윤병주
    • 한국정보처리학회논문지
    • /
    • 제6권3호
    • /
    • pp.633-640
    • /
    • 1999
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like Q-learning and TD(Temporal Difference)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present COMREL(COMpressed REinforcement Learning) algorithm for finding the shortest path fast in a maze environment, select the candidate states that can guide the shortest path in compressed maze environment, and learn only the candidate states to find the shortest path. After comparing COMREL algorithm with the already existing Q-learning and Priortized Sweeping algorithm, we could see that the learning time shortened very much.

  • PDF

목표지향적 강화학습 시스템 (Goal-Directed Reinforcement Learning System)

  • 이창훈
    • 한국인터넷방송통신학회논문지
    • /
    • 제10권5호
    • /
    • pp.265-270
    • /
    • 2010
  • 강화학습(reinforcement learning)은 동적 환경과 시행-착오를 통해 상호 작용하면서 학습을 수행한다. 그러므로 동적 환경에서 TD-학습과 TD(${\lambda}$)-학습과 같은 강화학습 방법들은 전통적인 통계적 학습 방법보다 더 빠르게 학습을 할 수 있다. 그러나 제안된 대부분의 강화학습 알고리즘들은 학습을 수행하는 에이전트(agent)가 목표 상태에 도달하였을 때만 강화 값(reinforcement value)이 주어지기 때문에 최적 해에 매우 늦게 수렴한다. 본 논문에서는 미로 환경(maze environment)에서 최단 경로를 빠르게 찾을 수 있는 강화학습 방법(GORLS : Goal-Directed Reinforcement Learning System)을 제안하였다. GDRLS 미로 환경에서 최단 경로가 될 수 있는 후보 상태들을 선택한다. 그리고 나서 최단 경로를 탐색하기 위해 후보 상태들을 학습한다. 실험을 통해, GDRLS는 미로 환경에서 TD-학습과 TD(${\lambda}$)-학습보다 더 빠르게 최단 경로를 탐색할 수 있음을 알 수 있다.

무선 애드혹 네트워크에서 노드분리 경로문제를 위한 강화학습 (Reinforcement Learning for Node-disjoint Path Problem in Wireless Ad-hoc Networks)

  • 장길웅
    • 한국정보통신학회논문지
    • /
    • 제23권8호
    • /
    • pp.1011-1017
    • /
    • 2019
  • 본 논문은 무선 애드혹 네트워크에서 신뢰성이 보장되는 데이터 전송을 위해 다중 경로를 설정하는 노드분리 경로문제를 해결하기 위한 강화학습을 제안한다. 노드분리 경로문제는 소스와 목적지사이에 중간 노드가 중복되지 않게 다수의 경로를 결정하는 문제이다. 본 논문에서는 기계학습 중 하나인 강화학습에서 Q-러닝을 사용하여 노드의 수가 많은 대규모의 무선 애드혹 네트워크에서 전송거리를 고려한 최적화 방법을 제안한다. 특히 대규모의 무선 애드혹 네트워크에서 노드분리 경로 문제를 해결하기 위해서는 많은 계산량이 요구되지만 제안된 강화학습은 효율적으로 경로를 학습함으로써 적절한 결과를 도출한다. 제안된 강화학습의 성능은 2개의 노드분리경로를 설정하기 위한 전송거리 관점에서 평가되었으며, 평가 결과에서 기존에 제안된 시뮬레이티드 어널링과 비교평가하여 전송거리면에서 더 좋은 성능을 보였다.

셀 분해 알고리즘을 활용한 심층 강화학습 기반 무인 항공기 경로 계획 (UAV Path Planning based on Deep Reinforcement Learning using Cell Decomposition Algorithm)

  • 김경훈;황병선;선준호;김수현;김진영
    • 한국인터넷방송통신학회논문지
    • /
    • 제24권3호
    • /
    • pp.15-20
    • /
    • 2024
  • 무인 항공기의 경로 계획은 고정 및 동적 장애물을 포함하는 복합 환경에서 장애물 충돌을 회피하는 것이 중요하다. RRT나 A*와 같은 경로 계획 알고리즘은 고정된 장애물 회피를 효과적으로 수행하지만, 고차원 환경일수록 계산 복잡도가 증가하는 한계점을 가진다. 강화학습 기반 알고리즘은 복합적인 환경 반영이 가능하지만, 기존 경로 계획 알고리즘과 같이 고차원 환경일수록 훈련 복잡도가 증가하여 수렴성을 기대하기 힘들다. 본 논문은 셀 분해 알고리즘을 활용한 강화학습 모델을 제안한다. 제안한 모델은 학습 환경을 세부적으로 분해하여 환경의 복잡도를 감소시킨다. 또한, 에이전트의 유효한 행동을 설정하여 장애물 회피 성능을 개선한다. 이를 통해 강화학습의 탐험 문제를 해결하고, 학습의 수렴성을 높인다. 시뮬레이션 결과는 제안된 모델이 일반적인 환경의 강화학습 모델과 비교하여 학습 속도를 개선하고 효율적인 경로를 계획할 수 있음을 보여준다.

A Study of Unmanned Aerial Vehicle Path Planning using Reinforcement Learning

  • Kim, Cheong Ghil
    • 반도체디스플레이기술학회지
    • /
    • 제17권1호
    • /
    • pp.88-92
    • /
    • 2018
  • Currently drone industry has become one of the fast growing markets and the technology for unmanned aerial vehicles are expected to continue to develop at a rapid rate. Especially small unmanned aerial vehicle systems have been designed and utilized for the various field with their own specific purposes. In these fields the path planning problem to find the shortest path between two oriented points is important. In this paper we introduce a path planning strategy for an autonomous flight of unmanned aerial vehicles through reinforcement learning with self-positioning technique. We perform Q-learning algorithm, a kind of reinforcement learning algorithm. At the same time, multi sensors of acceleraion sensor, gyro sensor, and magnetic are used to estimate the position. For the functional evaluation, the proposed method was simulated with virtual UAV environment and visualized the results. The flight history was based on a PX4 based drones system equipped with a smartphone.

설계민감도해석을 이용한 자동차후드 보강경로 최적설계 (Optimization of the Path of Inner Reinforcement for an Automobile Hood Using Design Sensitivity Analysis)

  • 이태희;이동기;구자겸;한석영;임장근
    • 대한기계학회논문집A
    • /
    • 제24권1호
    • /
    • pp.62-68
    • /
    • 2000
  • Optimization technique to find a path of an inner reinforcement of an automobile hood is proposed by using design sensitivity informations. The strength and modal characteristics of the automobile hood are analyzed and their design sensitivity analyses with respect to the thickness are carried out using MSC/NASTRAN. Based on the design sensitivity analysis, determination of design variables and response functions is discussed. Techniques improving design from design sensitivity informations are suggested and the double-layer method is newly proposed to optimize the path of stiffener for a shell structure, Using the suggested method, we redesign a new inner reinforcement of an automobile hood and compare the responses with the original design. It is confirmed that new design improved in the frequency responses without the weight increasement.

Leveraging Reinforcement Learning for Generating Construction Workers' Moving Path: Opportunities and Challenges

  • Kim, Minguk;Kim, Tae Wan
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.1085-1092
    • /
    • 2022
  • Travel distance is a parameter mainly used in the objective function of Construction Site Layout Planning (CSLP) automation models. To obtain travel distance, common approaches, such as linear distance, shortest-distance algorithm, visibility graph, and access road path, concentrate only on identifying the shortest path. However, humans do not necessarily follow one shortest path but can choose a safer and more comfortable path according to their situation within a reasonable range. Thus, paths generated by these approaches may be different from the actual paths of the workers, which may cause a decrease in the reliability of the optimized construction site layout. To solve this problem, this paper adopts reinforcement learning (RL) inspired by various concepts of cognitive science and behavioral psychology to generate a realistic path that mimics the decision-making and behavioral processes of wayfinding of workers on the construction site. To do so, in this paper, the collection of human wayfinding tendencies and the characteristics of the walking environment of construction sites are investigated and the importance of taking these into account in simulating the actual path of workers is emphasized. Furthermore, a simulation developed by mapping the identified tendencies to the reward design shows that the RL agent behaves like a real construction worker. Based on the research findings, some opportunities and challenges were proposed. This study contributes to simulating the potential path of workers based on deep RL, which can be utilized to calculate the travel distance of CSLP automation models, contributing to providing more reliable solutions.

  • PDF