Search | Korea Science

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity (오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도)

Kim, Chayoung;Park, Seohee;Lee, Woosik
- Journal of Internet Computing and Services
- /
- v.21 no.5
- /
- pp.1-7
- /
- 2020
Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.
https://doi.org/10.7472/jksii.2020.21.5.1 인용 PDF KSCI HTML

Obstacle Avoidance of Mobile Robot Using Reinforcement Learning in Virtual Environment (가상 환경에서의 강화학습을 활용한 모바일 로봇의 장애물 회피)

Lee, Jong-lark
- Journal of Internet of Things and Convergence
- /
- v.7 no.4
- /
- pp.29-34
- /
- 2021
In order to apply reinforcement learning to a robot in a real environment, it is necessary to use simulation in a virtual environment because numerous iterative learning is required. In addition, it is difficult to apply a learning algorithm that requires a lot of computation for a robot with low-spec. hardware. In this study, ML-Agent, a reinforcement learning frame provided by Unity, was used as a virtual simulation environment to apply reinforcement learning to the obstacle collision avoidance problem of mobile robots with low-spec hardware. A DQN supported by ML-Agent is adopted as a reinforcement learning algorithm and the results for a real robot show that the number of collisions occurred less then 2 times per minute.
https://doi.org/10.20465/KIOTS.2021.7.4.029 인용 PDF KSCI

Deep Reinforcement Learning-Based Cooperative Robot Using Facial Feedback (표정 피드백을 이용한 딥강화학습 기반 협력로봇 개발)

Jeon, Haein;Kang, Jeonghun;Kang, Bo-Yeong
- The Journal of Korea Robotics Society
- /
- v.17 no.3
- /
- pp.264-272
- /
- 2022
Human-robot cooperative tasks are increasingly required in our daily life with the development of robotics and artificial intelligence technology. Interactive reinforcement learning strategies suggest that robots learn task by receiving feedback from an experienced human trainer during a training process. However, most of the previous studies on Interactive reinforcement learning have required an extra feedback input device such as a mouse or keyboard in addition to robot itself, and the scenario where a robot can interactively learn a task with human have been also limited to virtual environment. To solve these limitations, this paper studies training strategies of robot that learn table balancing tasks interactively using deep reinforcement learning with human's facial expression feedback. In the proposed system, the robot learns a cooperative table balancing task using Deep Q-Network (DQN), which is a deep reinforcement learning technique, with human facial emotion expression feedback. As a result of the experiment, the proposed system achieved a high optimal policy convergence rate of up to 83.3% in training and successful assumption rate of up to 91.6% in testing, showing improved performance compared to the model without human facial expression feedback.
https://doi.org/10.7746/jkros.2022.17.3.264 인용 PDF KSCI

Optimal Design of Semi-Active Mid-Story Isolation System using Supervised Learning and Reinforcement Learning (지도학습과 강화학습을 이용한 준능동 중간층면진시스템의 최적설계)

Kang, Joo-Won;Kim, Hyun-Su
- Journal of Korean Association for Spatial Structures
- /
- v.21 no.4
- /
- pp.73-80
- /
- 2021
A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.
https://doi.org/10.9712/KASS.2021.21.4.73 인용 PDF KSCI

A DQN-based Two-Stage Scheduling Method for Real-Time Large-Scale EVs Charging Service

Tianyang Li;Yingnan Han;Xiaolong Li
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.3
- /
- pp.551-569
- /
- 2024
With the rapid development of electric vehicles (EVs) industry, EV charging service becomes more and more important. Especially, in the case of suddenly drop of air temperature or open holidays that large-scale EVs seeking for charging devices (CDs) in a short time. In such scenario, inefficient EV charging scheduling algorithm might lead to a bad service quality, for example, long queueing times for EVs and unreasonable idling time for charging devices. To deal with this issue, this paper propose a Deep-Q-Network (DQN) based two-stage scheduling method for the large-scale EVs charging service. Fine-grained states with two delicate neural networks are proposed to optimize the sequencing of EVs and charging station (CS) arrangement. Two efficient algorithms are presented to obtain the optimal EVs charging scheduling scheme for large-scale EVs charging demand. Three case studies show the superiority of our proposal, in terms of a high service quality (minimized average queuing time of EVs and maximized charging performance at both EV and CS sides) and achieve greater scheduling efficiency. The code and data are available at THE CODE AND DATA.
https://doi.org/10.3837/tiis.2024.03.002 인용 PDF HTML

Improving Dynamic Missile Defense Effectiveness Using Multi-Agent Deep Q-Network Model (멀티에이전트 기반 Deep Q-Network 모델을 이용한 동적 미사일 방어효과 개선)

Min Gook Kim;Dong Wook Hong;Bong Wan Choi;Ji Hoon Kyung
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.47 no.2
- /
- pp.74-83
- /
- 2024
The threat of North Korea's long-range firepower is recognized as a typical asymmetric threat, and South Korea is prioritizing the development of a Korean-style missile defense system to defend against it. To address this, previous research modeled North Korean long-range artillery attacks as a Markov Decision Process (MDP) and used Approximate Dynamic Programming as an algorithm for missile defense, but due to its limitations, there is an intention to apply deep reinforcement learning techniques that incorporate deep learning. In this paper, we aim to develop a missile defense system algorithm by applying a modified DQN with multi-agent-based deep reinforcement learning techniques. Through this, we have researched to ensure an efficient missile defense system can be implemented considering the style of attacks in recent wars, such as how effectively it can respond to enemy missile attacks, and have proven that the results learned through deep reinforcement learning show superior outcomes.
https://doi.org/10.11627/jksie.2024.47.2.074 인용 PDF

Application of Reinforcement Learning in Detecting Fraudulent Insurance Claims

Choi, Jung-Moon;Kim, Ji-Hyeok;Kim, Sung-Jun
- International Journal of Computer Science & Network Security
- /
- v.21 no.9
- /
- pp.125-131
- /
- 2021
Detecting fraudulent insurance claims is difficult due to small and unbalanced data. Some research has been carried out to better cope with various types of fraudulent claims. Nowadays, technology for detecting fraudulent insurance claims has been increasingly utilized in insurance and technology fields, thanks to the use of artificial intelligence (AI) methods in addition to traditional statistical detection and rule-based methods. This study obtained meaningful results for a fraudulent insurance claim detection model based on machine learning (ML) and deep learning (DL) technologies, using fraudulent insurance claim data from previous research. In our search for a method to enhance the detection of fraudulent insurance claims, we investigated the reinforcement learning (RL) method. We examined how we could apply the RL method to the detection of fraudulent insurance claims. There are limited previous cases of applying the RL method. Thus, we first had to define the RL essential elements based on previous research on detecting anomalies. We applied the deep Q-network (DQN) and double deep Q-network (DDQN) in the learning fraudulent insurance claim detection model. By doing so, we confirmed that our model demonstrated better performance than previous machine learning models.
https://doi.org/10.22937/IJCSNS.2021.21.9.17 인용 PDF KSCI

DQN-Based Task Migration with Traffic Prediction in UAV-MEC assisted Vehicular Network (UAV-MEC지원 차량 네트워크에서 트래픽 예측을 통한 DQN기반 태스크 마이그레이션)

Shin, A Young;Lim, Yujin
- Annual Conference of KIPS
- /
- 2022.11a
- /
- pp.144-146
- /
- 2022
차량 환경에서 발생하는 계산 집약적인 태스크가 증가하면서 모바일 엣지 컴퓨팅(MEC, Mobile Edge Computing)의 필요성이 높아지고 있다. 하지만 지상에 존재하는 MEC 서버는 출퇴근 시간과 같이 태스크가 일시적으로 급증하는 상황에 유동적으로 대처할 수 없으며, 이러한 상황을 대비하기 위해 지상 MEC 서버를 추가로 설치하는 것은 자원의 낭비를 불러온다. 최근 이 문제를 해결하기 위해 UAV(Unmanned Aerial Vehicle)기반 MEC 서버를 추가로 사용해 엣지 서비스를 제공하는 연구가 진행되고 있다. 그러나 UAV MEC 서버는 지상 MEC 서버와 달리 한정적인 배터리 용량으로 인해 서버 간 로드밸런싱을 통해 에너지 사용량을 최소화 하는 것이 필요하다. 본 논문에서는 UAV MEC 서버의 에너지 사용량을 고려한 마이그레이션 기법을 제안한다. 또한 GRU(Gated Recurrent Unit) 모델을 활용한 트래픽 예측을 바탕으로 한 마이그레이션을 통해 지연시간을 최소화할 수 있도록 한다. 제안 시스템의 성능을 평가하기 위해 MEC의 마이그레이션 시점을 결정하는 기준점와 차량의 밀도에 따라 실험을 진행하고, 서버의 로드 편차, UAV MEC 서버의 에너지 사용량 그리고 평균 지연 시간 측면에서 성능을 분석한다.
https://doi.org/10.3745/PKIPS.y2022m11a.144 인용 PDF

Performance Comparison of Reinforcement Learning Algorithms for Futures Scalping (해외선물 스캘핑을 위한 강화학습 알고리즘의 성능비교)

Jung, Deuk-Kyo;Lee, Se-Hun;Kang, Jae-Mo
- The Journal of the Convergence on Culture Technology
- /
- v.8 no.5
- /
- pp.697-703
- /
- 2022
Due to the recent economic downturn caused by Covid-19 and the unstable international situation, many investors are choosing the derivatives market as a means of investment. However, the derivatives market has a greater risk than the stock market, and research on the market of market participants is insufficient. Recently, with the development of artificial intelligence, machine learning has been widely used in the derivatives market. In this paper, reinforcement learning, one of the machine learning techniques, is applied to analyze the scalping technique that trades futures in minutes. The data set consists of 21 attributes using the closing price, moving average line, and Bollinger band indicators of 1 minute and 3 minute data for 6 months by selecting 4 products among futures products traded at trading firm. In the experiment, DNN artificial neural network model and three reinforcement learning algorithms, namely, DQN (Deep Q-Network), A2C (Advantage Actor Critic), and A3C (Asynchronous A2C) were used, and they were trained and verified through learning data set and test data set. For scalping, the agent chooses one of the actions of buying and selling, and the ratio of the portfolio value according to the action result is rewarded. Experiment results show that the energy sector products such as Heating Oil and Crude Oil yield relatively high cumulative returns compared to the index sector products such as Mini Russell 2000 and Hang Seng Index.
https://doi.org/10.17703/JCCT.2022.8.5.697 인용 PDF KSCI

Optimization of Dam Discharge in Drought Conditions Using Reinforcement Learning (강화학습을 이용한 가뭄 상황에서의 댐 방류량 최적화)

Hajin Noh;Yujin Lim
- Annual Conference of KIPS
- /
- 2023.05a
- /
- pp.606-608
- /
- 2023
최근 들어 극심한 가뭄이 지속됨에 따라 댐을 통한 물 수급에 어려움을 겪고 있다. 본 논문에서는 이러한 가뭄 상황에서 댐 자체 방류량 조절을 통해 낭비되고 있는 물을 절약하기 위한 기법을 제안한다. DQN 알고리즘을 사용해 방류량을 최적화하여 목표 저수량 이상의 상태를 60일간 유지하도록 설계하였으며, 해당 알고리즘 내 방류량의 가중치를 변경한 결과를 비교하여 그 성능을 분석하였다.
https://doi.org/10.3745/PKIPS.y2023m05a.606 인용 PDF

Search Result 64, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)