• Title/Summary/Keyword: deep Q learning

Search Result 85, Processing Time 0.03 seconds

DQN Reinforcement Learning for Mountain-Car in OpenAI Gym Environment (OpenAI Gym 환경의 Mountain-Car에 대한 DQN 강화학습)

  • Myung-Ju Kang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.375-377
    • /
    • 2024
  • 본 논문에서는 OpenAI Gym 환경에서 프로그램으로 간단한 제어가 가능한 Mountain-Car-v0 게임에 대해 DQN(Deep Q-Networks) 강화학습을 진행하였다. 본 논문에서 적용한 DQN 네트워크는 입력층 1개, 은닉층 3개, 출력층 1개로 구성하였고, 입력층과 은닉층에서의 활성화함수는 ReLU를, 출력층에서는 Linear함수를 활성화함수로 적용하였다. 실험은 Mountain-Car-v0에 대해 DQN 강화학습을 진행했을 때 각 에피소드별로 획득한 보상 결과를 살펴보고, 보상구간에 포함된 횟수를 분석하였다. 실험결과 전체 100회의 에피소드 중 보상을 50 이상 획득한 에피소드가 85개로 나타났다.

  • PDF

Development of Deep Learning Model for Fingerprint Identification at Digital Mobile Radio (무선 단말기 Fingerprint 식별을 위한 딥러닝 구조 개발)

  • Jung, Young-Giu;Shin, Hak-Chul;Nah, Sun-Phil
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.7-13
    • /
    • 2022
  • Radio frequency fingerprinting refers to a methodology that extracts hardware-specific characteristics of a transmitter that are unintentionally embedded in a transmitted waveform. In this paper, we put forward a fingerprinting feature and deep learning structure that can identify the same type of Digital Mobile Radio(DMR) by inputting the in-phase(I) and quadrature(Q). We proposes using the magnitude in polar coordinates of I/Q as RF fingerprinting feature and a modified ResNet-1D structure that can identify them. Experimental results show that our proposed modified ResNet-1D structure can achieve recognition accuracy of 99.5% on 20 DMR.

Enhancing Service Availability in Multi-Access Edge Computing with Deep Q-Learning

  • Lusungu Josh Mwasinga;Syed Muhammad Raza;Duc-Tai Le ;Moonseong Kim ;Hyunseung Choo
    • Journal of Internet Computing and Services
    • /
    • v.24 no.2
    • /
    • pp.1-10
    • /
    • 2023
  • The Multi-access Edge Computing (MEC) paradigm equips network edge telecommunication infrastructure with cloud computing resources. It seeks to transform the edge into an IT services platform for hosting resource-intensive and delay-stringent services for mobile users, thereby significantly enhancing perceived service quality of experience. However, erratic user mobility impedes seamless service continuity as well as satisfying delay-stringent service requirements, especially as users roam farther away from the serving MEC resource, which deteriorates quality of experience. This work proposes a deep reinforcement learning based service mobility management approach for ensuring seamless migration of service instances along user mobility. The proposed approach focuses on the problem of selecting the optimal MEC resource to host services for high mobility users, thereby reducing service migration rejection rate and enhancing service availability. Efficacy of the proposed approach is confirmed through simulation experiments, where results show that on average, the proposed scheme reduces service delay by 8%, task computing time by 36%, and migration rejection rate by more than 90%, when comparing to a baseline scheme.

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity (오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도)

  • Kim, Chayoung;Park, Seohee;Lee, Woosik
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.1-7
    • /
    • 2020
  • Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Deep Reinforcement Learning-Based Edge Caching in Heterogeneous Networks

  • Yoonjeong, Choi; Yujin, Lim
    • Journal of Information Processing Systems
    • /
    • v.18 no.6
    • /
    • pp.803-812
    • /
    • 2022
  • With the increasing number of mobile device users worldwide, utilizing mobile edge computing (MEC) devices close to users for content caching can reduce transmission latency than receiving content from a server or cloud. However, because MEC has limited storage capacity, it is necessary to determine the content types and sizes to be cached. In this study, we investigate a caching strategy that increases the hit ratio from small base stations (SBSs) for mobile users in a heterogeneous network consisting of one macro base station (MBS) and multiple SBSs. If there are several SBSs that users can access, the hit ratio can be improved by reducing duplicate content and increasing the diversity of content in SBSs. We propose a Deep Q-Network (DQN)-based caching strategy that considers time-varying content popularity and content redundancy in multiple SBSs. Content is stored in the SBS in a divided form using maximum distance separable (MDS) codes to enhance the diversity of the content. Experiments in various environments show that the proposed caching strategy outperforms the other methods in terms of hit ratio.

Machine Scheduling Models Based on Reinforcement Learning for Minimizing Due Date Violation and Setup Change (납기 위반 및 셋업 최소화를 위한 강화학습 기반의 설비 일정계획 모델)

  • Yoo, Woosik;Seo, Juhyeok;Kim, Dahee;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.3
    • /
    • pp.19-33
    • /
    • 2019
  • Recently, manufacturers have been struggling to efficiently use production equipment as their production methods become more sophisticated and complex. Typical factors hindering the efficiency of the manufacturing process include setup cost due to job change. Especially, in the process of using expensive production equipment such as semiconductor / LCD process, efficient use of equipment is very important. Balancing the tradeoff between meeting the deadline and minimizing setup cost incurred by changes of work type is crucial planning task. In this study, we developed a scheduling model to achieve the goal of minimizing the duedate and setup costs by using reinforcement learning in parallel machines with duedate and work preparation costs. The proposed model is a Deep Q-Network (DQN) scheduling model and is a reinforcement learning-based model. To validate the effectiveness of our proposed model, we compared it against the heuristic model and DNN(deep neural network) based model. It was confirmed that our proposed DQN method causes less due date violation and setup costs than the benchmark methods.

Study on Q-value prediction ahead of tunnel excavation face using recurrent neural network (순환인공신경망을 활용한 터널굴착면 전방 Q값 예측에 관한 연구)

  • Hong, Chang-Ho;Kim, Jin;Ryu, Hee-Hwan;Cho, Gye-Chun
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.22 no.3
    • /
    • pp.239-248
    • /
    • 2020
  • Exact rock classification helps suitable support patterns to be installed. Face mapping is usually conducted to classify the rock mass using RMR (Rock Mass Ration) or Q values. There have been several attempts to predict the grade of rock mass using mechanical data of jumbo drills or probe drills and photographs of excavation surfaces by using deep learning. However, they took long time, or had a limitation that it is impossible to grasp the rock grade in ahead of the tunnel surface. In this study, a method to predict the Q value ahead of excavation surface is developed using recurrent neural network (RNN) technique and it is compared with the Q values from face mapping for verification. Among Q values from over 4,600 tunnel faces, 70% of data was used for learning, and the rests were used for verification. Repeated learnings were performed in different number of learning and number of previous excavation surfaces utilized for learning. The coincidence between the predicted and actual Q values was compared with the root mean square error (RMSE). RMSE value from 600 times repeated learning with 2 prior excavation faces gives a lowest values. The results from this study can vary with the input data sets, the results can help to understand how the past ground conditions affect the future ground conditions and to predict the Q value ahead of the tunnel excavation face.

Performance Comparison of Deep Reinforcement Learning based Computation Offloading in MEC (MEC 환경에서 심층 강화학습을 이용한 오프로딩 기법의 성능비교)

  • Moon, Sungwon;Lim, Yujin
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.52-55
    • /
    • 2022
  • 5G 시대에 스마트 모바일 기기가 기하급수적으로 증가하면서 멀티 액세스 엣지 컴퓨팅(MEC)이 유망한 기술로 부상했다. 낮은 지연시간 안에 계산 집약적인 서비스를 제공하기 위해 MEC 서버로 오프로딩하는 특히, 태스크 도착률과 무선 채널의 상태가 확률적인 MEC 시스템 환경에서의 오프로딩 연구가 주목받고 있다. 본 논문에서는 차량의 전력과 지연시간을 최소화하기 위해 로컬 실행을 위한 연산 자원과 오프로딩을 위한 전송 전력을 할당하는 심층 강화학습 기반의 오프로딩 기법을 제안하였다. Deep Deterministic Policy Gradient (DDPG) 기반 기법과 Deep Q-network (DQN) 기반 기법을 차량의 전력 소비량과 큐잉 지연시간 측면에서 성능을 비교 분석하였다.

Trading Strategies Using Reinforcement Learning (강화학습을 이용한 트레이딩 전략)

  • Cho, Hyunmin;Shin, Hyun Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.123-130
    • /
    • 2021
  • With the recent developments in computer technology, there has been an increasing interest in the field of machine learning. This also has led to a significant increase in real business cases of machine learning theory in various sectors. In finance, it has been a major challenge to predict the future value of financial products. Since the 1980s, the finance industry has relied on technical and fundamental analysis for this prediction. For future value prediction models using machine learning, model design is of paramount importance to respond to market variables. Therefore, this paper quantitatively predicts the stock price movements of individual stocks listed on the KOSPI market using machine learning techniques; specifically, the reinforcement learning model. The DQN and A2C algorithms proposed by Google Deep Mind in 2013 are used for the reinforcement learning and they are applied to the stock trading strategies. In addition, through experiments, an input value to increase the cumulative profit is selected and its superiority is verified by comparison with comparative algorithms.