• 제목/요약/키워드: Markov Decision Process (MDP)

검색결과 35건 처리시간 0.019초

A Semi-Markov Decision Process (SMDP) for Active State Control of A Heterogeneous Network

  • Yang, Janghoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권7호
    • /
    • pp.3171-3191
    • /
    • 2016
  • Due to growing demand on wireless data traffic, a large number of different types of base stations (BSs) have been installed. However, space-time dependent wireless data traffic densities can result in a significant number of idle BSs, which implies the waste of power resources. To deal with this problem, we propose an active state control algorithm based on semi-Markov decision process (SMDP) for a heterogeneous network. A MDP in discrete time domain is formulated from continuous domain with some approximation. Suboptimal on-line learning algorithm with a random policy is proposed to solve the problem. We explicitly include coverage constraint so that active cells can provide the same signal to noise ratio (SNR) coverage with a targeted outage rate. Simulation results verify that the proposed algorithm properly controls the active state depending on traffic densities without increasing the number of handovers excessively while providing average user perceived rate (UPR) in a more power efficient way than a conventional algorithm.

Optimal Stochastic Policies in a network coding capable Ad Hoc Networks

  • Oh, Hayoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권12호
    • /
    • pp.4389-4410
    • /
    • 2014
  • Network coding is a promising technology that increases system throughput by reducing the number of packet transmissions from the source node to the destination node in a saturated traffic scenario. Nevertheless, some packets can suffer from end-to-end delay, because of a queuing delay in an intermediate node waiting for other packets to be encoded with exclusive or (XOR). In this paper, we analyze the delay according to packet arrival rate and propose two network coding schemes, iXOR (Intelligent XOR) and oXOR (Optimal XOR) with Markov Decision Process (MDP). They reduce the average delay, even under an unsaturated traffic load, through the Holding-${\chi}$ strategy. In particular, we are interested in the unsaturated network scenario. The unsaturated network is more practical because, in a real wireless network, nodes do not always have packets waiting to be sent. Through analysis and extensive simulations, we show that iXOR and oXOR are better than the Distributed Coordination Function (DCF) without XOR (the general forwarding scheme) and XOR with DCF with respect to average delay as well as delivery ratio.

마르코프 의사결정 과정에 기반한 대화 관리자 설계 (Design of Markov Decision Process Based Dialogue Manager)

  • 최준기;은지현;장두성;김현정;구명완
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.14-18
    • /
    • 2006
  • The role of dialogue manager is to select proper actions based on observed environment and inferred user intention. This paper presents stochastic model for dialogue manager based on Markov decision process. To build a mixed initiative dialogue manager, we used accumulated user utterance, previous act of dialogue manager, and domain dependent knowledge as the input to the MDP. We also used dialogue corpus to train the automatically optimized policy of MDP with reinforcement learning algorithm. The states which have unique and intuitive actions were removed from the design of MDP by using the domain knowledge. The design of dialogue manager included the usage of natural language understanding and response generator to build short message based remote control of home networked appliances.

  • PDF

심층 결정론적 정책 경사법을 이용한 선박 충돌 회피 경로 결정 (Determination of Ship Collision Avoidance Path using Deep Deterministic Policy Gradient Algorithm)

  • 김동함;이성욱;남종호;요시타카 후루카와
    • 대한조선학회논문집
    • /
    • 제56권1호
    • /
    • pp.58-65
    • /
    • 2019
  • The stability, reliability and efficiency of a smart ship are important issues as the interest in an autonomous ship has recently been high. An automatic collision avoidance system is an essential function of an autonomous ship. This system detects the possibility of collision and automatically takes avoidance actions in consideration of economy and safety. In order to construct an automatic collision avoidance system using reinforcement learning, in this work, the sequential decision problem of ship collision is mathematically formulated through a Markov Decision Process (MDP). A reinforcement learning environment is constructed based on the ship maneuvering equations, and then the three key components (state, action, and reward) of MDP are defined. The state uses parameters of the relationship between own-ship and target-ship, the action is the vertical distance away from the target course, and the reward is defined as a function considering safety and economics. In order to solve the sequential decision problem, the Deep Deterministic Policy Gradient (DDPG) algorithm which can express continuous action space and search an optimal action policy is utilized. The collision avoidance system is then tested assuming the $90^{\circ}$intersection encounter situation and yields a satisfactory result.

Seamless Mobility of Heterogeneous Networks Based on Markov Decision Process

  • Preethi, G.A.;Chandrasekar, C.
    • Journal of Information Processing Systems
    • /
    • 제11권4호
    • /
    • pp.616-629
    • /
    • 2015
  • A mobile terminal will expect a number of handoffs within its call duration. In the event of a mobile call, when a mobile node moves from one cell to another, it should connect to another access point within its range. In case there is a lack of support of its own network, it must changeover to another base station. In the event of moving on to another network, quality of service parameters need to be considered. In our study we have used the Markov decision process approach for a seamless handoff as it gives the optimum results for selecting a network when compared to other multiple attribute decision making processes. We have used the network cost function for selecting the network for handoff and the connection reward function, which is based on the values of the quality of service parameters. We have also examined the constant bit rate and transmission control protocol packet delivery ratio. We used the policy iteration algorithm for determining the optimal policy. Our enhanced handoff algorithm outperforms other previous multiple attribute decision making methods.

환자 우선순위를 고려한 수술실 예약 : 이진검색을 활용한 수정 평가치반복법 (Operating Room Reservation Problem Considering Patient Priority : Modified Value Iteration Method with Binary Search)

  • 민대기
    • 산업공학
    • /
    • 제24권4호
    • /
    • pp.274-280
    • /
    • 2011
  • Delayed access to surgery may lead to deterioration in the patient condition, poor clinical outcomes, increase in the probability of emergency admission, or even death. The purpose of this work is to decide the number of patients selected from a waiting list and to schedule them in accordance with the operating room capacity in the next period. We formulate the problem as an infinite horizon Markov Decision Process (MDP), which attempts to strike a balance between the patient waiting times and overtime works. Structural properties of the proposed model are investigated to facilitate the solution procedure. The proposed procedure modifies the conventional value iteration method along with the binary search technique. An example of the optimal policy is provided, and computational results are given to show that the proposed procedure improves computational efficiency.

댐 군 월별 운영 정책의 도출을 위한 추계적 동적 계획 모형 (A Stochastic Dynamic Programming Model to Derive Monthly Operating Policy of a Multi-Reservoir System)

  • 임동규;김재희;김승권
    • 경영과학
    • /
    • 제29권1호
    • /
    • pp.1-14
    • /
    • 2012
  • The goal of the multi-reservoir operation planning is to provide an optimal release plan that maximize the reservoir storage and hydropower generation while minimizing the spillages. However, the reservoir operation is difficult due to the uncertainty associated with inflows. In order to consider the uncertain inflows in the reservoir operating problem, we present a Stochastic Dynamic Programming (SDP) model based on the markov decision process (MDP). The objective of the model is to maximize the expected value of the system performance that is the weighted sum of all expected objective values. With the SDP model, multi-reservoir operating rule can be derived, and it also generates the steady state probabilities of reservoir storage and inflow as output. We applied the model to the Geum-river basin in Korea and could generate a multi-reservoir monthly operating plan that can consider the uncertainty of inflow.

수명의 불확실성을 반영한 추계학적 장비 대체시기 결정모형 (A Stochastic Optimization Model for Equipment Replacement Considering Life Uncertainty)

  • 박종인;김승권
    • 한국국방경영분석학회지
    • /
    • 제29권2호
    • /
    • pp.100-110
    • /
    • 2003
  • Equipment replacement policy may not be defined with certainty, because physical states of any technological system may not be determined with foresight. This paper presents Markov Decision Process(MDP) model for army equipment which is subject to the uncertainty of deterioration and ultimately to failure. The components of the MDP model is defined as follows: ⅰ) state is identified as the age of the equipment, ⅱ) actions are classified as 'keep' and 'replace', ⅲ) cost is defined as the expected cost per unit time associated with 'keep' and 'replace' actions, ⅳ) transition probability is derived from Weibull distribution. Using the MDP model, we can determine the optimal replacement policy for an army equipment replacement problem.

MANET에서 종단간 통신지연 최소화를 위한 심층 강화학습 기반 분산 라우팅 알고리즘 (Deep Reinforcement Learning-based Distributed Routing Algorithm for Minimizing End-to-end Delay in MANET)

  • Choi, Yeong-Jun;Seo, Ju-Sung;Hong, Jun-Pyo
    • 한국정보통신학회논문지
    • /
    • 제25권9호
    • /
    • pp.1267-1270
    • /
    • 2021
  • In this paper, we propose a distributed routing algorithm for mobile ad hoc networks (MANET) where mobile devices can be utilized as relays for communication between remote source-destination nodes. The objective of the proposed algorithm is to minimize the end-to-end communication delay caused by transmission failure with deep channel fading. In each hop, the node needs to select the next relaying node by considering a tradeoff relationship between the link stability and forward link distance. Based on such feature, we formulate the problem with partially observable Markov decision process (MDP) and apply deep reinforcement learning to derive effective routing strategy for the formulated MDP. Simulation results show that the proposed algorithm outperforms other baseline schemes in terms of the average end-to-end delay.

폐쇄공간에서의 에이전트 행동 예측을 위한 MDP 모델 (MDP Modeling for the Prediction of Agent Movement in Limited Space)

  • 진효원;김수환;정치정;이문걸
    • 한국경영과학회지
    • /
    • 제40권3호
    • /
    • pp.63-72
    • /
    • 2015
  • This paper presents the issue that is predicting the movement of an agent in an enclosed space by using the MDP (Markov Decision Process). Recent researches on the optimal path finding are confined to derive the shortest path with the use of deterministic algorithm such as $A^*$ or Dijkstra. On the other hand, this study focuses in predicting the path that the agent chooses to escape the limited space as time passes, with the stochastic method. The MDP reward structure from GIS (Geographic Information System) data contributed this model to a feasible model. This model has been approved to have the high predictability after applied to the route of previous armed red guerilla.