Search | Korea Science

A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization (Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구)

Park, Jung-min;Kim, Hye-young;Cho, Sung Hyun
- Journal of Korea Game Society
- /
- v.19 no.3
- /
- pp.5-14
- /
- 2019
The gaming server is based on a distributed server. In order to distribute workloads of gaming servers, distributed gaming servers apply some algorithms which divide each of gaming server's workload into balanced workload among the gaming servers and as a result, efficiently manage response time and fusibility of server requested by the clients. In this paper, we propose a load balancing agent using PPO(Proximal Policy Optimization) which is one of the methods from a greedy algorithm and Policy Gradient which is from Reinforcement Learning. The proposed load balancing agent is compared with the previous researches based on the simulation.
https://doi.org/10.7583/JKGS.2019.19.3.5 인용 PDF KSCI HTML

An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm

Kim, Hye-Young
- Journal of Information Processing Systems
- /
- v.17 no.2
- /
- pp.297-305
- /
- 2021
Large amount of data is being generated in gaming servers due to the increase in the number of users and the variety of game services being provided. In particular, load balancing schemes for gaming servers are crucial consideration. The existing literature proposes algorithms that distribute loads in servers by mostly concentrating on load balancing and cooperative offloading. However, many proposed schemes impose heavy restrictions and assumptions, and such a limited service classification method is not enough to satisfy the wide range of service requirements. We propose a load balancing agent that combines the dynamic allocation programming method, a type of greedy algorithm, and proximal policy optimization, a reinforcement learning. Also, we compare performances of our proposed scheme and those of a scheme from previous literature, ProGreGA, by running a simulation.
https://doi.org/10.3745/JIPS.03.0158 인용 PDF KSCI

Cloud Task Scheduling Based on Proximal Policy Optimization Algorithm for Lowering Energy Consumption of Data Center

Yang, Yongquan;He, Cuihua;Yin, Bo;Wei, Zhiqiang;Hong, Bowei
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.6
- /
- pp.1877-1891
- /
- 2022
As a part of cloud computing technology, algorithms for cloud task scheduling place an important influence on the area of cloud computing in data centers. In our earlier work, we proposed DeepEnergyJS, which was designed based on the original version of the policy gradient and reinforcement learning algorithm. We verified its effectiveness through simulation experiments. In this study, we used the Proximal Policy Optimization (PPO) algorithm to update DeepEnergyJS to DeepEnergyJSV2.0. First, we verify the convergence of the PPO algorithm on the dataset of Alibaba Cluster Data V2018. Then we contrast it with reinforcement learning algorithm in terms of convergence rate, converged value, and stability. The results indicate that PPO performed better in training and test data sets compared with reinforcement learning algorithm, as well as other general heuristic algorithms, such as First Fit, Random, and Tetris. DeepEnergyJSV2.0 achieves better energy efficiency than DeepEnergyJS by about 7.814%.
https://doi.org/10.3837/tiis.2022.06.006 인용 PDF KSCI HTML

Flight Trajectory Simulation via Reinforcement Learning in Virtual Environment (가상 환경에서의 강화학습을 이용한 비행궤적 시뮬레이션)

Lee, Jae-Hoon;Kim, Tae-Rim;Song, Jong-Gyu;Im, Hyun-Jae
- Journal of the Korea Society for Simulation
- /
- v.27 no.4
- /
- pp.1-8
- /
- 2018
The most common way to control a target point using artificial intelligence is through reinforcement learning. However, it had to process complicated calculations that were difficult to implement in order to process reinforcement learning. In this paper, the enhanced Proximal Policy Optimization (PPO) algorithm was used to simulate finding the planned flight trajectory to reach the target point in the virtual environment. In this paper, we simulated how this problem was used to find the planned flight trajectory to reach the target point in the virtual environment using the enhanced Proximal Policy Optimization(PPO) algorithm. In addition, variables such as changes in trajectory, effects of rewards, and external winds are added to determine the zero conditions of external environmental factors on flight trajectory learning, and the effects on trajectory learning performance and learning speed are compared. From this result, the simulation results have shown that the agent can find the optimal trajectory in spite of changes in the various external environments, which will be applicable to the actual vehicle.
https://doi.org/10.9709/JKSS.2018.27.4.001 인용 PDF KSCI HTML

A Study about the Usefulness of Reinforcement Learning in Business Simulation Games using PPO Algorithm (경영 시뮬레이션 게임에서 PPO 알고리즘을 적용한 강화학습의 유용성에 관한 연구)

Liang, Yi-Hong;Kang, Sin-Jin;Cho, Sung Hyun
- Journal of Korea Game Society
- /
- v.19 no.6
- /
- pp.61-70
- /
- 2019
In this paper, we apply reinforcement learning in the field of management simulation game to check whether game agents achieve autonomously given goal. In this system, we apply PPO (Proximal Policy Optimization) algorithm in the Unity Machine Learning (ML) Agent environment and the game agent is designed to automatically find a way to play. Five game scenario simulation experiments were conducted to verify their usefulness. As a result, it was confirmed that the game agent achieves the goal through learning despite the change of environment variables in the game.
https://doi.org/10.7583/JKGS.2019.19.6.61 인용 PDF KSCI

A Study on Asset Allocation Using Proximal Policy Optimization (근위 정책 최적화를 활용한 자산 배분에 관한 연구)

Lee, Woo Sik
- Journal of the Korean Society of Industry Convergence
- /
- v.25 no.4_2
- /
- pp.645-653
- /
- 2022
Recently, deep reinforcement learning has been applied to a variety of industries, such as games, robotics, autonomous vehicles, and data cooling systems. An algorithm called reinforcement learning allows for automated asset allocation without the requirement for ongoing monitoring. It is free to choose its own policies. The purpose of this paper is to carry out an empirical analysis of the performance of asset allocation strategies. Among the strategies considered were the conventional Mean- Variance Optimization (MVO) and the Proximal Policy Optimization (PPO). According to the findings, the PPO outperformed both its benchmark index and the MVO. This paper demonstrates how dynamic asset allocation can benefit from the development of a reinforcement learning algorithm.
https://doi.org/10.21289/KSIC.2022.25.4.645 인용 PDF KSCI HTML

Design of track path-finding simulation using Unity ML Agents

In-Chul Han;Jin-Woong Kim;Soo Kyun Kim
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.2
- /
- pp.61-66
- /
- 2024
This paper aims to design a simulation for path-finding of objects in a simulation or game environment using reinforcement learning techniques. The main feature of this study is that the objects in the simulation are trained to avoid obstacles at random locations generated on a given track and to automatically explore path to get items. To implement the simulation, ML Agents provided by Unity Game Engine were used, and a learning policy based on PPO (Proximal Policy Optimization) was established to form a reinforcement learning environment. Through the reinforcement learning-based simulation designed in this study, we were able to confirm that the object moves on the track by avoiding obstacles and exploring path to acquire items as it learns, by analyzing the simulation results and learning result graph.
https://doi.org/10.9708/jksci.2024.29.02.061 인용 PDF HTML

Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policies via Object Manipulation with an Anthropomorphic Robot Hand (휴먼형 로봇 손의 사물 조작 수행을 이용한 사람 데모 결합 강화학습 정책 성능 평가)

Park, Na Hyeon;Oh, Ji Heon;Ryu, Ga Hyun;Lopez, Patricio Rivera;Anazco, Edwin Valarezo;Kim, Tae Seong
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.5
- /
- pp.179-186
- /
- 2021
Manipulation of complex objects with an anthropomorphic robot hand like a human hand is a challenge in the human-centric environment. In order to train the anthropomorphic robot hand which has a high degree of freedom (DoF), human demonstration augmented deep reinforcement learning policy optimization methods have been proposed. In this work, we first demonstrate augmentation of human demonstration in deep reinforcement learning (DRL) is effective for object manipulation by comparing the performance of the augmentation-free Natural Policy Gradient (NPG) and Demonstration Augmented NPG (DA-NPG). Then three DRL policy optimization methods, namely NPG, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO), have been evaluated with DA (i.e., DA-NPG, DA-TRPO, and DA-PPO) and without DA by manipulating six objects such as apple, banana, bottle, light bulb, camera, and hammer. The results show that DA-NPG achieved the average success rate of 99.33% whereas NPG only achieved 60%. In addition, DA-NPG succeeded grasping all six objects while DA-TRPO and DA-PPO failed to grasp some objects and showed unstable performances.
https://doi.org/10.3745/KTSDE.2021.10.5.179 인용 PDF KSCI

Scheduling of Wafer Burn-In Test Process Using Simulation and Reinforcement Learning (강화학습과 시뮬레이션을 활용한 Wafer Burn-in Test 공정 스케줄링)

Soon-Woo Kwon;Won-Jun Oh;Seong-Hyeok Ahn;Hyun-Seo Lee;Hoyeoul Lee; In-Beom Park
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.2
- /
- pp.107-113
- /
- 2024
Scheduling of semiconductor test facilities has been crucial since effective scheduling contributes to the profits of semiconductor enterprises and enhances the quality of semiconductor products. This study aims to solve the scheduling problems for the wafer burn-in test facilities of the semiconductor back-end process by utilizing simulation and deep reinforcement learning-based methods. To solve the scheduling problem considered in this study. we propose novel state, action, and reward designs based on the Markov decision process. Furthermore, a neural network is trained by employing the recent RL-based method, named proximal policy optimization. Experimental results showed that the proposed method outperformed traditional heuristic-based scheduling techniques, achieving a higher due date compliance rate of jobs in terms of total job completion time.
PDF

PGA: An Efficient Adaptive Traffic Signal Timing Optimization Scheme Using Actor-Critic Reinforcement Learning Algorithm

Shen, Si;Shen, Guojiang;Shen, Yang;Liu, Duanyang;Yang, Xi;Kong, Xiangjie
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.11
- /
- pp.4268-4289
- /
- 2020
Advanced traffic signal timing method plays very important role in reducing road congestion and air pollution. Reinforcement learning is considered as superior approach to build traffic light timing scheme by many recent studies. It fulfills real adaptive control by the means of taking real-time traffic information as state, and adjusting traffic light scheme as action. However, existing works behave inefficient in complex intersections and they are lack of feasibility because most of them adopt traffic light scheme whose phase sequence is flexible. To address these issues, a novel adaptive traffic signal timing scheme is proposed. It's based on actor-critic reinforcement learning algorithm, and advanced techniques proximal policy optimization and generalized advantage estimation are integrated. In particular, a new kind of reward function and a simplified form of state representation are carefully defined, and they facilitate to improve the learning efficiency and reduce the computational complexity, respectively. Meanwhile, a fixed phase sequence signal scheme is derived, and constraint on the variations of successive phase durations is introduced, which enhances its feasibility and robustness in field applications. The proposed scheme is verified through field-data-based experiments in both medium and high traffic density scenarios. Simulation results exhibit remarkable improvement in traffic performance as well as the learning efficiency comparing with the existing reinforcement learning-based methods such as 3DQN and DDQN.
https://doi.org/10.3837/tiis.2020.11.002 인용 PDF KSCI HTML

Search Result 26, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)