• Title/Summary/Keyword: Multi-Armed Bandit(MAB)

Search Result 3, Processing Time 0.017 seconds

Opportunistic Spectrum Access Based on a Constrained Multi-Armed Bandit Formulation

  • Ai, Jing;Abouzeid, Alhussein A.
    • Journal of Communications and Networks
    • /
    • v.11 no.2
    • /
    • pp.134-147
    • /
    • 2009
  • Tracking and exploiting instantaneous spectrum opportunities are fundamental challenges in opportunistic spectrum access (OSA) in presence of the bursty traffic of primary users and the limited spectrum sensing capability of secondary users. In order to take advantage of the history of spectrum sensing and access decisions, a sequential decision framework is widely used to design optimal policies. However, many existing schemes, based on a partially observed Markov decision process (POMDP) framework, reveal that optimal policies are non-stationary in nature which renders them difficult to calculate and implement. Therefore, this work pursues stationary OSA policies, which are thereby efficient yet low-complexity, while still incorporating many practical factors, such as spectrum sensing errors and a priori unknown statistical spectrum knowledge. First, with an approximation on channel evolution, OSA is formulated in a multi-armed bandit (MAB) framework. As a result, the optimal policy is specified by the wellknown Gittins index rule, where the channel with the largest Gittins index is always selected. Then, closed-form formulas are derived for the Gittins indices with tunable approximation, and the design of a reinforcement learning algorithm is presented for calculating the Gittins indices, depending on whether the Markovian channel parameters are available a priori or not. Finally, the superiority of the scheme is presented via extensive experiments compared to other existing schemes in terms of the quality of policies and optimality.

Reinforcement Learning-Based Illuminance Control Method for Building Lighting System (강화학습 기반 빌딩의 방별 조명 시스템 조도값 설정 기법)

  • Kim, Jongmin;Kim, Sunyong
    • Journal of IKEEE
    • /
    • v.26 no.1
    • /
    • pp.56-61
    • /
    • 2022
  • Various efforts have been made worldwide to respond to environmental problems such as climate change. Research on artificial intelligence (AI)-based energy management has been widely conducted as the most effective way to alleviate the climate change problem. In particular, buildings that account for more than 20% of the total energy delivered worldwide have been focused as a target for energy management using the building energy management system (BEMS). In this paper, we propose a multi-armed bandit (MAB)-based energy management algorithm that can efficiently decide the energy consumption level of the lighting system in each room of the building, while minimizing the discomfort levels of occupants of each room.

Hybrid Offloading Technique Based on Auction Theory and Reinforcement Learning in MEC Industrial IoT Environment (MEC 산업용 IoT 환경에서 경매 이론과 강화 학습 기반의 하이브리드 오프로딩 기법)

  • Bae Hyeon Ji;Kim Sung Wook
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.9
    • /
    • pp.263-272
    • /
    • 2023
  • Industrial Internet of Things (IIoT) is an important factor in increasing production efficiency in industrial sectors, along with data collection, exchange and analysis through large-scale connectivity. However, as traffic increases explosively due to the recent spread of IIoT, an allocation method that can efficiently process traffic is required. In this thesis, I propose a two-stage task offloading decision method to increase successful task throughput in an IIoT environment. In addition, I consider a hybrid offloading system that can offload compute-intensive tasks to a mobile edge computing server via a cellular link or to a nearby IIoT device via a Device to Device (D2D) link. The first stage is to design an incentive mechanism to prevent devices participating in task offloading from acting selfishly and giving difficulties in improving task throughput. Among the mechanism design, McAfee's mechanism is used to control the selfish behavior of the devices that process the task and to increase the overall system throughput. After that, in stage 2, I propose a multi-armed bandit (MAB)-based task offloading decision method in a non-stationary environment by considering the irregular movement of the IIoT device. Experimental results show that the proposed method can obtain better performance in terms of overall system throughput, communication failure rate and regret compared to other existing methods.