• Title/Summary/Keyword: deep reinforcement learning

Search Result 210, Processing Time 0.025 seconds

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot (백스터 로봇의 시각기반 로봇 팔 조작 딥러닝을 위한 강화학습 알고리즘 구현)

  • Kim, Seongun;Kim, Sol A;de Lima, Rafael;Choi, Jaesik
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.1
    • /
    • pp.40-49
    • /
    • 2019
  • Reinforcement learning has been applied to various problems in robotics. However, it was still hard to train complex robotic manipulation tasks since there is a few models which can be applicable to general tasks. Such general models require a lot of training episodes. In these reasons, deep neural networks which have shown to be good function approximators have not been actively used for robot manipulation task. Recently, some of these challenges are solved by a set of methods, such as Guided Policy Search, which guide or limit search directions while training of a deep neural network based policy model. These frameworks are already applied to a humanoid robot, PR2. However, in robotics, it is not trivial to adjust existing algorithms designed for one robot to another robot. In this paper, we present our implementation of Guided Policy Search to the robotic arms of the Baxter Research Robot. To meet the goals and needs of the project, we build on an existing implementation of Baxter Agent class for the Guided Policy Search algorithm code using the built-in Python interface. This work is expected to play an important role in popularizing robot manipulation reinforcement learning methods on cost-effective robot platforms.

Map-Based Obstacle Avoidance Algorithm for Mobile Robot Using Deep Reinforcement Learning (심층 강화학습을 이용한 모바일 로봇의 맵 기반 장애물 회피 알고리즘)

  • Sunwoo, Yung-Min;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.25 no.2
    • /
    • pp.337-343
    • /
    • 2021
  • Deep reinforcement learning is an artificial intelligence algorithm that enables learners to select optimal behavior based on raw and, high-dimensional input data. A lot of research using this is being conducted to create an optimal movement path of a mobile robot in an environment in which obstacles exist. In this paper, we selected the Dueling Double DQN (D3QN) algorithm that uses the prioritized experience replay to create the moving path of mobile robot from the image of the complex surrounding environment. The virtual environment is implemented using Webots, a robot simulator, and through simulation, it is confirmed that the mobile robot grasped the position of the obstacle in real time and avoided it to reach the destination.

Modified Deep Reinforcement Learning Agent for Dynamic Resource Placement in IoT Network Slicing

  • Ros, Seyha;Tam, Prohim;Kim, Seokhoon
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.17-23
    • /
    • 2022
  • Network slicing is a promising paradigm and significant evolution for adjusting the heterogeneous services based on different requirements by placing dynamic virtual network functions (VNF) forwarding graph (VNFFG) and orchestrating service function chaining (SFC) based on criticalities of Quality of Service (QoS) classes. In system architecture, software-defined networks (SDN), network functions virtualization (NFV), and edge computing are used to provide resourceful data view, configurable virtual resources, and control interfaces for developing the modified deep reinforcement learning agent (MDRL-A). In this paper, task requests, tolerable delays, and required resources are differentiated for input state observations to identify the non-critical/critical classes, since each user equipment can execute different QoS application services. We design intelligent slicing for handing the cross-domain resource with MDRL-A in solving network problems and eliminating resource usage. The agent interacts with controllers and orchestrators to manage the flow rule installation and physical resource allocation in NFV infrastructure (NFVI) with the proposed formulation of completion time and criticality criteria. Simulation is conducted in SDN/NFV environment and capturing the QoS performances between conventional and MDRL-A approaches.

Deep reinforcement learning for a multi-objective operation in a nuclear power plant

  • Junyong Bae;Jae Min Kim;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3277-3290
    • /
    • 2023
  • Nuclear power plant (NPP) operations with multiple objectives and devices are still performed manually by operators despite the potential for human error. These operations could be automated to reduce the burden on operators; however, classical approaches may not be suitable for these multi-objective tasks. An alternative approach is deep reinforcement learning (DRL), which has been successful in automating various complex tasks and has been applied in automation of certain operations in NPPs. But despite the recent progress, previous studies using DRL for NPP operations have limitations to handle complex multi-objective operations with multiple devices efficiently. This study proposes a novel DRL-based approach that addresses these limitations by employing a continuous action space and straightforward binary rewards supported by the adoption of a soft actor-critic and hindsight experience replay. The feasibility of the proposed approach was evaluated for controlling the pressure and volume of the reactor coolant while heating the coolant during NPP startup. The results show that the proposed approach can train the agent with a proper strategy for effectively achieving multiple objectives through the control of multiple devices. Moreover, hands-on testing results demonstrate that the trained agent is capable of handling untrained objectives, such as cooldown, with substantial success.

Deep Reinforcement Learning-Based C-V2X Distributed Congestion Control for Real-Time Vehicle Density Response (실시간 차량 밀도에 대응하는 심층강화학습 기반 C-V2X 분산혼잡제어)

  • Byeong Cheol Jeon;Woo Yoel Yang;Han-Shin Jo
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.379-385
    • /
    • 2023
  • Distributed congestion control (DCC) is a technology that mitigates channel congestion and improves communication performance in high-density vehicular networks. Traditional DCC techniques operate to reduce channel congestion without considering quality of service (QoS) requirements. Such design of DCC algorithms can lead to excessive DCC actions, potentially degrading other aspects of QoS. To address this issue, we propose a deep reinforcement learning-based QoS-adaptive DCC algorithm. The simulation was conducted using a quasi-real environment simulator, generating dynamic vehicular densities for evaluation. The simulation results indicate that our proposed DCC algorithm achieves results closer to the targeted QoS compared to existing DCC algorithms.

A Study of Reinforcement Learning-based Cyber Attack Prediction using Network Attack Simulator (NASim) (네트워크 공격 시뮬레이터를 이용한 강화학습 기반 사이버 공격 예측 연구)

  • Bum-Sok Kim;Jung-Hyun Kim;Min-Suk Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.112-118
    • /
    • 2023
  • As technology advances, the need for enhanced preparedness against cyber-attacks becomes an increasingly critical problem. Therefore, it is imperative to consider various circumstances and to prepare for cyber-attack strategic technology. This paper proposes a method to solve network security problems by applying reinforcement learning to cyber-security. In general, traditional static cyber-security methods have difficulty effectively responding to modern dynamic attack patterns. To address this, we implement cyber-attack scenarios such as 'Tiny Alpha' and 'Small Alpha' and evaluate the performance of various reinforcement learning methods using Network Attack Simulator, which is a cyber-attack simulation environment based on the gymnasium (formerly Open AI gym) interface. In addition, we experimented with different RL algorithms such as value-based methods (Q-Learning, Deep-Q-Network, and Double Deep-Q-Network) and policy-based methods (Actor-Critic). As a result, we observed that value-based methods with discrete action spaces consistently outperformed policy-based methods with continuous action spaces, demonstrating a performance difference ranging from a minimum of 20.9% to a maximum of 53.2%. This result shows that the scheme not only suggests opportunities for enhancing cybersecurity strategies, but also indicates potential applications in cyber-security education and system validation across a large number of domains such as military, government, and corporate sectors.

  • PDF

Punching Motion Generation using Reinforcement Learning and Trajectory Search Method (경로 탐색 기법과 강화학습을 사용한 주먹 지르기동작 생성 기법)

  • Park, Hyun-Jun;Choi, WeDong;Jang, Seung-Ho;Hong, Jeong-Mo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.969-981
    • /
    • 2018
  • Recent advances in machine learning approaches such as deep neural network and reinforcement learning offer significant performance improvements in generating detailed and varied motions in physically simulated virtual environments. The optimization methods are highly attractive because it allows for less understanding of underlying physics or mechanisms even for high-dimensional subtle control problems. In this paper, we propose an efficient learning method for stochastic policy represented as deep neural networks so that agent can generate various energetic motions adaptively to the changes of tasks and states without losing interactivity and robustness. This strategy could be realized by our novel trajectory search method motivated by the trust region policy optimization method. Our value-based trajectory smoothing technique finds stably learnable trajectories without consulting neural network responses directly. This policy is set as a trust region of the artificial neural network, so that it can learn the desired motion quickly.

Reinforcement Learning based Autonomous Emergency Steering Control in Virtual Environments (가상 환경에서의 강화학습 기반 긴급 회피 조향 제어)

  • Lee, Hunki;Kim, Taeyun;Kim, Hyobin;Hwang, Sung-Ho
    • Journal of Drive and Control
    • /
    • v.19 no.4
    • /
    • pp.110-116
    • /
    • 2022
  • Recently, various studies have been conducted to apply deep learning and AI to various fields of autonomous driving, such as recognition, sensor processing, decision-making, and control. This paper proposes a controller applicable to path following, static obstacle avoidance, and pedestrian avoidance situations by utilizing reinforcement learning in autonomous vehicles. For repetitive driving simulation, a reinforcement learning environment was constructed using virtual environments. After learning path following scenarios, we compared control performance with Pure-Pursuit controllers and Stanley controllers, which are widely used due to their good performance and simplicity. Based on the test case of the KNCAP test and assessment protocol, autonomous emergency steering scenarios and autonomous emergency braking scenarios were created and used for learning. Experimental results from zero collisions demonstrated that the reinforcement learning controller was successful in the stationary obstacle avoidance scenario and pedestrian collision scenario under a given condition.

Trading Strategies Using Reinforcement Learning (강화학습을 이용한 트레이딩 전략)

  • Cho, Hyunmin;Shin, Hyun Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.123-130
    • /
    • 2021
  • With the recent developments in computer technology, there has been an increasing interest in the field of machine learning. This also has led to a significant increase in real business cases of machine learning theory in various sectors. In finance, it has been a major challenge to predict the future value of financial products. Since the 1980s, the finance industry has relied on technical and fundamental analysis for this prediction. For future value prediction models using machine learning, model design is of paramount importance to respond to market variables. Therefore, this paper quantitatively predicts the stock price movements of individual stocks listed on the KOSPI market using machine learning techniques; specifically, the reinforcement learning model. The DQN and A2C algorithms proposed by Google Deep Mind in 2013 are used for the reinforcement learning and they are applied to the stock trading strategies. In addition, through experiments, an input value to increase the cumulative profit is selected and its superiority is verified by comparison with comparative algorithms.

A Research on Low-power Buffer Management Algorithm based on Deep Q-Learning approach for IoT Networks (IoT 네트워크에서의 심층 강화학습 기반 저전력 버퍼 관리 기법에 관한 연구)

  • Song, Taewon
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.4
    • /
    • pp.1-7
    • /
    • 2022
  • As the number of IoT devices increases, power management of the cluster head, which acts as a gateway between the cluster and sink nodes in the IoT network, becomes crucial. Particularly when the cluster head is a mobile wireless terminal, the power consumption of the IoT network must be minimized over its lifetime. In addition, the delay of information transmission in the IoT network is one of the primary metrics for rapid information collecting in the IoT network. In this paper, we propose a low-power buffer management algorithm that takes into account the information transmission delay in an IoT network. By forwarding or skipping received packets utilizing deep Q learning employed in deep reinforcement learning methods, the suggested method is able to reduce power consumption while decreasing transmission delay level. The proposed approach is demonstrated to reduce power consumption and to improve delay relative to the existing buffer management technique used as a comparison in slotted ALOHA protocol.