• Title/Summary/Keyword: PPO algorithm

Search Result 16, Processing Time 0.028 seconds

A Study about the Usefulness of Reinforcement Learning in Business Simulation Games using PPO Algorithm (경영 시뮬레이션 게임에서 PPO 알고리즘을 적용한 강화학습의 유용성에 관한 연구)

  • Liang, Yi-Hong;Kang, Sin-Jin;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.6
    • /
    • pp.61-70
    • /
    • 2019
  • In this paper, we apply reinforcement learning in the field of management simulation game to check whether game agents achieve autonomously given goal. In this system, we apply PPO (Proximal Policy Optimization) algorithm in the Unity Machine Learning (ML) Agent environment and the game agent is designed to automatically find a way to play. Five game scenario simulation experiments were conducted to verify their usefulness. As a result, it was confirmed that the game agent achieves the goal through learning despite the change of environment variables in the game.

Reinforcement learning-based control with application to the once-through steam generator system

  • Cheng Li;Ren Yu;Wenmin Yu;Tianshu Wang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.10
    • /
    • pp.3515-3524
    • /
    • 2023
  • A reinforcement learning framework is proposed for the control problem of outlet steam pressure of the once-through steam generator(OTSG) in this paper. The double-layer controller using Proximal Policy Optimization(PPO) algorithm is applied in the control structure of the OTSG. The PPO algorithm can train the neural networks continuously according to the process of interaction with the environment and then the trained controller can realize better control for the OTSG. Meanwhile, reinforcement learning has the characteristic of difficult application in real-world objects, this paper proposes an innovative pretraining method to solve this problem. The difficulty in the application of reinforcement learning lies in training. The optimal strategy of each step is summed up through trial and error, and the training cost is very high. In this paper, the LSTM model is adopted as the training environment for pretraining, which saves training time and improves efficiency. The experimental results show that this method can realize the self-adjustment of control parameters under various working conditions, and the control effect has the advantages of small overshoot, fast stabilization speed, and strong adaptive ability.

Cloud Task Scheduling Based on Proximal Policy Optimization Algorithm for Lowering Energy Consumption of Data Center

  • Yang, Yongquan;He, Cuihua;Yin, Bo;Wei, Zhiqiang;Hong, Bowei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1877-1891
    • /
    • 2022
  • As a part of cloud computing technology, algorithms for cloud task scheduling place an important influence on the area of cloud computing in data centers. In our earlier work, we proposed DeepEnergyJS, which was designed based on the original version of the policy gradient and reinforcement learning algorithm. We verified its effectiveness through simulation experiments. In this study, we used the Proximal Policy Optimization (PPO) algorithm to update DeepEnergyJS to DeepEnergyJSV2.0. First, we verify the convergence of the PPO algorithm on the dataset of Alibaba Cluster Data V2018. Then we contrast it with reinforcement learning algorithm in terms of convergence rate, converged value, and stability. The results indicate that PPO performed better in training and test data sets compared with reinforcement learning algorithm, as well as other general heuristic algorithms, such as First Fit, Random, and Tetris. DeepEnergyJSV2.0 achieves better energy efficiency than DeepEnergyJS by about 7.814%.

Adaptive Fast Calibration Method for Active Phased Array Antennas using PPO Algorithm (PPO 알고리즘을 이용한 능동위상배열안테나 적응형 고속 보정 방법)

  • Sunge Lee;Kisik Byun;Hong-Jib, Yoon
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.636-643
    • /
    • 2023
  • In this paper, a high-speed calibration method for phased array antennas in the far-field is presented A max calibration, which is a simplification of the rotating-element electric-field vector (REV) method that calibrates each antenna element only through received power, and a method of grouping calibrations by sub-array unit rather than each antenna element were proposed. Using the Proximal Policy Optimization (PPO) algorithm, we found a partitioning optimized for the distribution of phased array antennas and calibrated it on a subarray basis. An adaptive max calibration method that allows faster calibration than the conventional method was proposed and verified through simulation. Not only is the gain of the phased array antenna higher while calibration is being made to the target, but the beam pattern is closer to the ideal beam pattern than the conventional method.

Comparison of Reinforcement Learning Algorithms used in Game AI (게임 인공지능에 사용되는 강화학습 알고리즘 비교)

  • Kim, Deokhyung;Jung, Hyunjun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.693-696
    • /
    • 2021
  • There are various algorithms in reinforcement learning, and the algorithm used differs depending on the field. Even in games, specific algorithms are used when developing AI (artificial intelligence) using reinforcement learning. Different algorithms have different learning methods, so artificial intelligence is created differently. Therefore, the developer has to choose the appropriate algorithm to implement the AI for the purpose. To do that, the developer needs to know the algorithm's learning method and which algorithms are effective for which AI. Therefore, this paper compares the learning methods of three algorithms, SAC, PPO, and POCA, which are algorithms used to implement game AI. These algorithms are practical to apply to which types of AI implementations.

  • PDF

A Study on Asset Allocation Using Proximal Policy Optimization (근위 정책 최적화를 활용한 자산 배분에 관한 연구)

  • Lee, Woo Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.4_2
    • /
    • pp.645-653
    • /
    • 2022
  • Recently, deep reinforcement learning has been applied to a variety of industries, such as games, robotics, autonomous vehicles, and data cooling systems. An algorithm called reinforcement learning allows for automated asset allocation without the requirement for ongoing monitoring. It is free to choose its own policies. The purpose of this paper is to carry out an empirical analysis of the performance of asset allocation strategies. Among the strategies considered were the conventional Mean- Variance Optimization (MVO) and the Proximal Policy Optimization (PPO). According to the findings, the PPO outperformed both its benchmark index and the MVO. This paper demonstrates how dynamic asset allocation can benefit from the development of a reinforcement learning algorithm.

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay (매치 3 게임 플레이를 위한 PPO 알고리즘을 이용한 강화학습 에이전트의 설계 및 구현)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.3
    • /
    • pp.1-6
    • /
    • 2021
  • Most of the match-3 puzzle games supports automatic play using the MCTS algorithm. However, implementing reinforcement learning agents is not an easy job because it requires both the knowledge of machine learning and the way of complex interactions within the development environment. This study proposes a method in which we can easily design reinforcement learning agents and implement game play agents by applying PPO(Proximal Policy Optimization) algorithms. And we could identify the performance was increased about 44% than the conventional method. The tools we used are the Unity 3D game engine and Unity ML SDK. The experimental result shows that agents became to learn game rules and make better strategic decisions as experiments go on. On average, the puzzle gameplay agents implemented in this study played puzzle games better than normal people. It is expected that the designed agent could be used to speed up the game level design process.

Flight Trajectory Simulation via Reinforcement Learning in Virtual Environment (가상 환경에서의 강화학습을 이용한 비행궤적 시뮬레이션)

  • Lee, Jae-Hoon;Kim, Tae-Rim;Song, Jong-Gyu;Im, Hyun-Jae
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.4
    • /
    • pp.1-8
    • /
    • 2018
  • The most common way to control a target point using artificial intelligence is through reinforcement learning. However, it had to process complicated calculations that were difficult to implement in order to process reinforcement learning. In this paper, the enhanced Proximal Policy Optimization (PPO) algorithm was used to simulate finding the planned flight trajectory to reach the target point in the virtual environment. In this paper, we simulated how this problem was used to find the planned flight trajectory to reach the target point in the virtual environment using the enhanced Proximal Policy Optimization(PPO) algorithm. In addition, variables such as changes in trajectory, effects of rewards, and external winds are added to determine the zero conditions of external environmental factors on flight trajectory learning, and the effects on trajectory learning performance and learning speed are compared. From this result, the simulation results have shown that the agent can find the optimal trajectory in spite of changes in the various external environments, which will be applicable to the actual vehicle.

A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization (Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구)

  • Park, Jung-min;Kim, Hye-young;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.3
    • /
    • pp.5-14
    • /
    • 2019
  • The gaming server is based on a distributed server. In order to distribute workloads of gaming servers, distributed gaming servers apply some algorithms which divide each of gaming server's workload into balanced workload among the gaming servers and as a result, efficiently manage response time and fusibility of server requested by the clients. In this paper, we propose a load balancing agent using PPO(Proximal Policy Optimization) which is one of the methods from a greedy algorithm and Policy Gradient which is from Reinforcement Learning. The proposed load balancing agent is compared with the previous researches based on the simulation.

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

  • Choi, Seung-Yoon;Le, Tuyen Pham;Chung, Tae-Choong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.23-31
    • /
    • 2018
  • Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.