• Title/Summary/Keyword: Model based reinforcement learning

Search Result 155, Processing Time 0.025 seconds

Dynamic Computation Offloading Based on Q-Learning for UAV-Based Mobile Edge Computing

  • Shreya Khisa;Sangman Moh
    • Smart Media Journal
    • /
    • v.12 no.3
    • /
    • pp.68-76
    • /
    • 2023
  • Emerging mobile edge computing (MEC) can be used in battery-constrained Internet of things (IoT). The execution latency of IoT applications can be improved by offloading computation-intensive tasks to an MEC server. Recently, the popularity of unmanned aerial vehicles (UAVs) has increased rapidly, and UAV-based MEC systems are receiving considerable attention. In this paper, we propose a dynamic computation offloading paradigm for UAV-based MEC systems, in which a UAV flies over an urban environment and provides edge services to IoT devices on the ground. Since most IoT devices are energy-constrained, we formulate our problem as a Markov decision process considering the energy level of the battery of each IoT device. We also use model-free Q-learning for time-critical tasks to maximize the system utility. According to our performance study, the proposed scheme can achieve desirable convergence properties and make intelligent offloading decisions.

A slide reinforcement learning for the consensus of a multi-agents system (다중 에이전트 시스템의 컨센서스를 위한 슬라이딩 기법 강화학습)

  • Yang, Janghoon
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.4
    • /
    • pp.226-234
    • /
    • 2022
  • With advances in autonomous vehicles and networked control, there is a growing interest in the consensus control of a multi-agents system to control multi-agents with distributed control beyond the control of a single agent. Since consensus control is a distributed control, it is bound to have delay in a practical system. In addition, it is often difficult to have a very accurate mathematical model for a system. Even though a reinforcement learning (RL) method was developed to deal with these issues, it often experiences slow convergence in the presence of large uncertainties. Thus, we propose a slide RL which combines the sliding mode control with RL to be robust to the uncertainties. The structure of a sliding mode control is introduced to the action in RL while an auxiliary sliding variable is included in the state information. Numerical simulation results show that the slide RL provides comparable performance to the model-based consensus control in the presence of unknown time-varying delay and disturbance while outperforming existing state-of-the-art RL-based consensus algorithms.

Optimal sensor placement for structural health monitoring based on deep reinforcement learning

  • Xianghao Meng;Haoyu Zhang;Kailiang Jia;Hui Li;Yong Huang
    • Smart Structures and Systems
    • /
    • v.31 no.3
    • /
    • pp.247-257
    • /
    • 2023
  • In structural health monitoring of large-scale structures, optimal sensor placement plays an important role because of the high cost of sensors and their supporting instruments, as well as the burden of data transmission and storage. In this study, a vibration sensor placement algorithm based on deep reinforcement learning (DRL) is proposed, which can effectively solve non-convex, high-dimensional, and discrete combinatorial sensor placement optimization problems. An objective function is constructed to estimate the quality of a specific vibration sensor placement scheme according to the modal assurance criterion (MAC). Using this objective function, a DRL-based algorithm is presented to determine the optimal vibration sensor placement scheme. Subsequently, we transform the sensor optimal placement process into a Markov decision process and employ a DRL-based optimization algorithm to maximize the objective function for optimal sensor placement. To illustrate the applicability of the proposed method, two examples are presented: a 10-story braced frame and a sea-crossing bridge model. A comparison study is also performed with a genetic algorithm and particle swarm algorithm. The proposed DRL-based algorithm can effectively solve the discrete combinatorial optimization problem for vibration sensor placements and can produce superior performance compared with the other two existing methods.

An Automatic Cooperative coordination Model for the Multiagent System using Reinforcement Learning (강화학습을 이용한 멀티 에이전트 시스템의 자동 협력 조정 모델)

  • 정보윤;윤소정;오경환
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.1
    • /
    • pp.1-11
    • /
    • 1999
  • Agent-based systems technology has generated lots of excitement in these years because of its promise as a new paradigm for conceptualizing. designing. and l implementing software systems Especially, there has been many researches for multi agent system because of the characteristics that it fits to the distributed and open Internet environments. In a multiagent system. agents must cooperate with each other through a Coordination procedure. when the conflicts between agents arise. where those are caused b by the point that each action acts for a purpose separately without coordination. But P previous researches for coordination methods in multi agent system have a deficiency that they can not solve correctly the cooperation problem between agents which have different goals in dynamic environment. In this paper. we solve the cooperation problem of multiagent that has multiple goals in a dynamic environment. with an automatic cooperative coordination model using I reinforcement learning. We will show the two pursuit problems that we extend a traditional problem in multi agent systems area for modeling the restriction in the multiple goals in a dynamic environment. and we have verified the validity of the proposed model with an experiment.

  • PDF

Reinforcement Learning-based Dynamic Weapon Assignment to Multi-Caliber Long-Range Artillery Attacks (다종 장사정포 공격에 대한 강화학습 기반의 동적 무기할당)

  • Hyeonho Kim;Jung Hun Kim;Joohoe Kong;Ji Hoon Kyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.42-52
    • /
    • 2022
  • North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel's "Iron Dome", designed to protect against North Korea's arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea's multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.

Analysis and study of Deep Reinforcement Learning based Resource Allocation for Renewable Powered 5G Ultra-Dense Networks

  • Hamza Ali Alshawabkeh
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.1
    • /
    • pp.226-234
    • /
    • 2024
  • The frequent handover problem and playing ping-pong effects in 5G (5th Generation) ultra-dense networking cannot be effectively resolved by the conventional handover decision methods, which rely on the handover thresholds and measurement reports. For instance, millimetre-wave LANs, broadband remote association techniques, and 5G/6G organizations are instances of group of people yet to come frameworks that request greater security, lower idleness, and dependable principles and correspondence limit. One of the critical parts of 5G and 6G innovation is believed to be successful blockage the board. With further developed help quality, it empowers administrator to run many systems administration recreations on a solitary association. To guarantee load adjusting, forestall network cut disappointment, and give substitute cuts in case of blockage or cut frustration, a modern pursuing choices framework to deal with showing up network information is require. Our goal is to balance the strain on BSs while optimizing the value of the information that is transferred from satellites to BSs. Nevertheless, due to their irregular flight characteristic, some satellites frequently cannot establish a connection with Base Stations (BSs), which further complicates the joint satellite-BS connection and channel allocation. SF redistribution techniques based on Deep Reinforcement Learning (DRL) have been devised, taking into account the randomness of the data received by the terminal. In order to predict the best capacity improvements in the wireless instruments of 5G and 6G IoT networks, a hybrid algorithm for deep learning is being used in this study. To control the level of congestion within a 5G/6G network, the suggested approach is put into effect to a training set. With 0.933 accuracy and 0.067 miss rate, the suggested method produced encouraging results.

Retained Message Delivery Scheme utilizing Reinforcement Learning in MQTT-based IoT Networks (MQTT 기반 IoT 네트워크에서 강화학습을 활용한 Retained 메시지 전송 방법)

  • Yeunwoong Kyung;Tae-Kook Kim;Youngjun Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.2
    • /
    • pp.131-135
    • /
    • 2024
  • In the MQTT protocol, if the retained flag of a message published by a publisher is set, the message is stored in the broker as a retained message. When a new subscriber performs a subscribe, the broker immediately sends the retained message. This allows the new subscriber to perform updates on the current state via the retained message without waiting for new messages from the publisher. However, sending retained messages can become a traffic overhead if new messages are frequently published by the publisher. This situation could be considered an overhead when new subscribers frequently subscribe. Therefore, in this paper, we propose a retained message delivery scheme by considering the characteristics of the published messages. We model the delivery and waiting actions to new subscribers from the perspective of the broker using reinforcement learning, and determine the optimal policy through Q learning algorithm. Through performance analysis, we confirm that the proposed method shows improved performance compared to existing methods.

Relation Extraction Model for Noisy Data Handling on Distant Supervision Data based on Reinforcement Learning (원격지도학습데이터의 오류를 처리하는 강화학습기반 관계추출 모델)

  • Yoon, Sooji;Nam, Sangha;Kim, Eun-kyung;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.55-60
    • /
    • 2018
  • 기계학습 기반인 관계추출 모델을 설계할 때 다량의 학습데이터를 빠르게 얻기 위해 원격지도학습 방식으로 데이터를 수집한다. 이러한 데이터는 잘못 분류되어 학습데이터로 사용되기 때문에 모델의 성능에 부정적인 영향을 끼칠 수 있다. 본 논문에서는 이러한 문제를 강화학습 접근법을 사용해 해결하고자 한다. 본 논문에서 제안하는 모델은 오 분류된 데이터로부터 좋은 품질의 데이터를 찾는 문장선택기와 선택된 문장들을 가지고 학습이 되어 관계를 추출하는 관계추출기로 구성된다. 문장선택기는 지도학습데이터 없이 관계추출기로부터 피드백을 받아 학습이 진행된다. 이러한 방식은 기존의 관계추출 모델보다 좋은 성능을 보여주었고 결과적으로 원격지도학습데이터의 단점을 해결한 방법임을 보였다.

  • PDF

Developing Reinforcement Learning based Job Allocation Model by Using FlexSim Software (FlexSim 소프트웨어를 이용한 강화학습 기반 작업 할당 모형 개발)

  • Jin-Sung Park;Jun-Woo Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.311-313
    • /
    • 2023
  • 병렬 기계 작업장에서 자원을 효율적으로 활용하기 위해서는 처리할 작업을 적절한 기계에 할당해야 한다. 특정 작업을 처리할 기계를 선택할 때 휴리스틱을 사용할 수도 있으나, 특정 작업장에 맞춤화된 휴리스틱을 개발하는 것은 쉽지 않다. 반면, 본 논문에서는 이종 병렬 기계 작업장을 위한 작업 할당 모형을 개발하는데 강화학습을 응용하고자 한다. 작업 할당 모형을 학습하는데 필요한 에피소드들은 상용 시뮬레이션 소프트웨어인 FlexSim을 이용하여 생성하였다. 아울러, stable-baseline3 라이브러리를 이용하여 강화학습 알고리즘을 생성된 에피소드들에 적용하였다. 실험 결과를 통해 시뮬레이션과 강화학습이 작업장 운영관리에 유용함을 알 수 있었다.

  • PDF

Real-time RL-based 5G Network Slicing Design and Traffic Model Distribution: Implementation for V2X and eMBB Services

  • WeiJian Zhou;Azharul Islam;KyungHi Chang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2573-2589
    • /
    • 2023
  • As 5G mobile systems carry multiple services and applications, numerous user, and application types with varying quality of service requirements inside a single physical network infrastructure are the primary problem in constructing 5G networks. Radio Access Network (RAN) slicing is introduced as a way to solve these challenges. This research focuses on optimizing RAN slices within a singular physical cell for vehicle-to-everything (V2X) and enhanced mobile broadband (eMBB) UEs, highlighting the importance of adept resource management and allocation for the evolving landscape of 5G services. We put forth two unique strategies: one being offline network slicing, also referred to as standard network slicing, and the other being Online reinforcement learning (RL) network slicing. Both strategies aim to maximize network efficiency by gathering network model characteristics and augmenting radio resources for eMBB and V2X UEs. When compared to traditional network slicing, RL network slicing shows greater performance in the allocation and utilization of UE resources. These steps are taken to adapt to fluctuating traffic loads using RL strategies, with the ultimate objective of bolstering the efficiency of generic 5G services.