• Title/Summary/Keyword: Deep Q-Learning

Search Result 83, Processing Time 0.023 seconds

Improved Deep Q-Network Algorithm Using Self-Imitation Learning (Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘)

  • Sunwoo, Yung-Min;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.644-649
    • /
    • 2021
  • Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, and Q-Learning

  • Park, Ji Su;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.16 no.5
    • /
    • pp.1001-1007
    • /
    • 2020
  • In recent years, machine learning algorithms are continuously being used and expanded in various fields, such as facial recognition, signal processing, personal authentication, and stock prediction. In particular, various algorithms, such as deep learning, reinforcement learning, and Q-learning, are continuously being improved. Among these algorithms, the expansion of deep learning is rapidly changing. Nevertheless, machine learning algorithms have not yet been applied in several fields, such as personal authentication technology. This technology is an essential tool in the digital information era, walking recognition technology as promising biometrics, and technology for solving state-space problems. Therefore, algorithm technologies of deep learning, reinforcement learning, and Q-learning, which are typical machine learning algorithms in various fields, such as agricultural technology, personal authentication, wireless network, game, biometric recognition, and image recognition, are being improved and expanded in this paper.

Applying CEE (CrossEntropyError) to improve performance of Q-Learning algorithm (Q-learning 알고리즘이 성능 향상을 위한 CEE(CrossEntropyError)적용)

  • Kang, Hyun-Gu;Seo, Dong-Sung;Lee, Byeong-seok;Kang, Min-Soo
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.1-9
    • /
    • 2017
  • Recently, the Q-Learning algorithm, which is one kind of reinforcement learning, is mainly used to implement artificial intelligence system in combination with deep learning. Many research is going on to improve the performance of Q-Learning. Therefore, purpose of theory try to improve the performance of Q-Learning algorithm. This Theory apply Cross Entropy Error to the loss function of Q-Learning algorithm. Since the mean squared error used in Q-Learning is difficult to measure the exact error rate, the Cross Entropy Error, known to be highly accurate, is applied to the loss function. Experimental results show that the success rate of the Mean Squared Error used in the existing reinforcement learning was about 12% and the Cross Entropy Error used in the deep learning was about 36%. The success rate was shown.

Visual Analysis of Deep Q-network

  • Seng, Dewen;Zhang, Jiaming;Shi, Xiaoying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.3
    • /
    • pp.853-873
    • /
    • 2021
  • In recent years, deep reinforcement learning (DRL) models are enjoying great interest as their success in a variety of challenging tasks. Deep Q-Network (DQN) is a widely used deep reinforcement learning model, which trains an intelligent agent that executes optimal actions while interacting with an environment. This model is well known for its ability to surpass skilled human players across many Atari 2600 games. Although DQN has achieved excellent performance in practice, there lacks a clear understanding of why the model works. In this paper, we present a visual analytics system for understanding deep Q-network in a non-blind matter. Based on the stored data generated from the training and testing process, four coordinated views are designed to expose the internal execution mechanism of DQN from different perspectives. We report the system performance and demonstrate its effectiveness through two case studies. By using our system, users can learn the relationship between states and Q-values, the function of convolutional layers, the strategies learned by DQN and the rationality of decisions made by the agent.

Applying Deep Reinforcement Learning to Improve Throughput and Reduce Collision Rate in IEEE 802.11 Networks

  • Ke, Chih-Heng;Astuti, Lia
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.334-349
    • /
    • 2022
  • The effectiveness of Wi-Fi networks is greatly influenced by the optimization of contention window (CW) parameters. Unfortunately, the conventional approach employed by IEEE 802.11 wireless networks is not scalable enough to sustain consistent performance for the increasing number of stations. Yet, it is still the default when accessing channels for single-users of 802.11 transmissions. Recently, there has been a spike in attempts to enhance network performance using a machine learning (ML) technique known as reinforcement learning (RL). Its advantage is interacting with the surrounding environment and making decisions based on its own experience. Deep RL (DRL) uses deep neural networks (DNN) to deal with more complex environments (such as continuous state spaces or actions spaces) and to get optimum rewards. As a result, we present a new approach of CW control mechanism, which is termed as contention window threshold (CWThreshold). It uses the DRL principle to define the threshold value and learn optimal settings under various network scenarios. We demonstrate our proposed method, known as a smart exponential-threshold-linear backoff algorithm with a deep Q-learning network (SETL-DQN). The simulation results show that our proposed SETL-DQN algorithm can effectively improve the throughput and reduce the collision rates.

A fast and simplified crack width quantification method via deep Q learning

  • Xiong Peng;Kun Zhou;Bingxu Duan;Xingu Zhong;Chao Zhao;Tianyu Zhang
    • Smart Structures and Systems
    • /
    • v.32 no.4
    • /
    • pp.219-233
    • /
    • 2023
  • Crack width is an important indicator to evaluate the health condition of the concrete structure. The crack width is measured by manual using crack width gauge commonly, which is time-consuming and laborious. In this paper, we have proposed a fast and simplified crack width quantification method via deep Q learning and geometric calculation. Firstly, the crack edge is extracted by using U-Net network and edge detection operator. Then, the intelligent decision of is made by the deep Q learning model. Further, the geometric calculation method based on endpoint and curvature extreme point detection is proposed. Finally, a case study is carried out to demonstrate the effectiveness of the proposed method, achieving high precision in the real crack width quantification.

A Research on Low-power Buffer Management Algorithm based on Deep Q-Learning approach for IoT Networks (IoT 네트워크에서의 심층 강화학습 기반 저전력 버퍼 관리 기법에 관한 연구)

  • Song, Taewon
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.4
    • /
    • pp.1-7
    • /
    • 2022
  • As the number of IoT devices increases, power management of the cluster head, which acts as a gateway between the cluster and sink nodes in the IoT network, becomes crucial. Particularly when the cluster head is a mobile wireless terminal, the power consumption of the IoT network must be minimized over its lifetime. In addition, the delay of information transmission in the IoT network is one of the primary metrics for rapid information collecting in the IoT network. In this paper, we propose a low-power buffer management algorithm that takes into account the information transmission delay in an IoT network. By forwarding or skipping received packets utilizing deep Q learning employed in deep reinforcement learning methods, the suggested method is able to reduce power consumption while decreasing transmission delay level. The proposed approach is demonstrated to reduce power consumption and to improve delay relative to the existing buffer management technique used as a comparison in slotted ALOHA protocol.

Development of Semi-Active Control Algorithm Using Deep Q-Network (Deep Q-Network를 이용한 준능동 제어알고리즘 개발)

  • Kim, Hyun-Su;Kang, Joo-Won
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.1
    • /
    • pp.79-86
    • /
    • 2021
  • Control performance of a smart tuned mass damper (TMD) mainly depends on control algorithms. A lot of control strategies have been proposed for semi-active control devices. Recently, machine learning begins to be applied to development of vibration control algorithm. In this study, a reinforcement learning among machine learning techniques was employed to develop a semi-active control algorithm for a smart TMD. The smart TMD was composed of magnetorheological damper in this study. For this purpose, an 11-story building structure with a smart TMD was selected to construct a reinforcement learning environment. A time history analysis of the example structure subject to earthquake excitation was conducted in the reinforcement learning procedure. Deep Q-network (DQN) among various reinforcement learning algorithms was used to make a learning agent. The command voltage sent to the MR damper is determined by the action produced by the DQN. Parametric studies on hyper-parameters of DQN were performed by numerical simulations. After appropriate training iteration of the DQN model with proper hyper-parameters, the DQN model for control of seismic responses of the example structure with smart TMD was developed. The developed DQN model can effectively control smart TMD to reduce seismic responses of the example structure.

Lane Change Methodology for Autonomous Vehicles Based on Deep Reinforcement Learning (심층강화학습 기반 자율주행차량의 차로변경 방법론)

  • DaYoon Park;SangHoon Bae;Trinh Tuan Hung;Boogi Park;Bokyung Jung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.1
    • /
    • pp.276-290
    • /
    • 2023
  • Several efforts in Korea are currently underway with the goal of commercializing autonomous vehicles. Hence, various studies are emerging on autonomous vehicles that drive safely and quickly according to operating guidelines. The current study examines the path search of an autonomous vehicle from a microscopic viewpoint and tries to prove the efficiency required by learning the lane change of an autonomous vehicle through Deep Q-Learning. A SUMO was used to achieve this purpose. The scenario was set to start with a random lane at the starting point and make a right turn through a lane change to the third lane at the destination. As a result of the study, the analysis was divided into simulation-based lane change and simulation-based lane change applied with Deep Q-Learning. The average traffic speed was improved by about 40% in the case of simulation with Deep Q-Learning applied, compared to the case without application, and the average waiting time was reduced by about 2 seconds and the average queue length by about 2.3 vehicles.

Application of Deep Recurrent Q Network with Dueling Architecture for Optimal Sepsis Treatment Policy

  • Do, Thanh-Cong;Yang, Hyung Jeong;Ho, Ngoc-Huynh
    • Smart Media Journal
    • /
    • v.10 no.2
    • /
    • pp.48-54
    • /
    • 2021
  • Sepsis is one of the leading causes of mortality globally, and it costs billions of dollars annually. However, treating septic patients is currently highly challenging, and more research is needed into a general treatment method for sepsis. Therefore, in this work, we propose a reinforcement learning method for learning the optimal treatment strategies for septic patients. We model the patient physiological time series data as the input for a deep recurrent Q-network that learns reliable treatment policies. We evaluate our model using an off-policy evaluation method, and the experimental results indicate that it outperforms the physicians' policy, reducing patient mortality up to 3.04%. Thus, our model can be used as a tool to reduce patient mortality by supporting clinicians in making dynamic decisions.