• Title, Summary, Keyword: Markov Decision Process

### Equivalent Transformations of Undiscounted Nonhomogeneous Markov Decision Processes

• Park, Yun-Sun
• Journal of the Korean Operations Research and Management Science Society
• /
• v.17 no.2
• /
• pp.131-144
• /
• 1992
• Even though nonhomogeneous Markov Decision Processes subsume homogeneous Markov Decision Processes and are more practical in the real world, there are many results for them. In this paper we address the nonhomogeneous Markov Decision Process with objective to maximize average reward. By extending works of Ross [17] in the homogeneous case adopting the result of Bean and Smith [3] for the dicounted deterministic problem, we first transform the original problem into the discounted nonhomogeneous Markov Decision Process. Then, secondly, we transform into the discounted deterministic problem. This approach not only shows the interrelationships between various problems but also attacks the solution method of the undiscounted nohomogeneous Markov Decision Process.

### On The Mathematical Structure of Markov Process and Markovian Sequential Decision Process (Markov 과정(過程)의 수리적(數理的) 구조(構造)와 그 축차결정과정(逐次決定過程))

• Kim, Yu-Song
• Journal of the Korean Society for Quality Management
• /
• v.11 no.2
• /
• pp.2-9
• /
• 1983
• As will be seen, this paper is tries that the research on the mathematical structure of Markov process and Markovian sequential decision process (the policy improvement iteration method,) moreover, that it analyze the logic and the characteristic of behavior of mathematical model of Markov process. Therefore firstly, it classify, on research of mathematical structure of Markov process, the forward equation and backward equation of Chapman-kolmogorov equation and of kolmogorov differential equation, and then have survey on logic of equation systems or on the question of uniqueness and existence of solution of the equation. Secondly, it classify, at the Markovian sequential decision process, the case of discrete time parameter and the continuous time parameter, and then it explore the logic system of characteristic of the behavior, the value determination operation and the policy improvement routine.

### Markov Decision Process for Curling Strategies (MDP에 의한 컬링 전략 선정)

• Bae, Kiwook;Park, Dong Hyun;Kim, Dong Hyun;Shin, Hayong
• Journal of Korean Institute of Industrial Engineers
• /
• v.42 no.1
• /
• pp.65-72
• /
• 2016
• Curling is compared to the Chess because of variety and importance of strategies. For winning the Curling game, selecting optimal strategies at decision making points are important. However, there is lack of research on optimal strategies for Curling. 'Aggressive' and 'Conservative' strategies are common strategies of Curling; nevertheless, even those two strategies have never been studied before. In this study, Markov Decision Process would be applied for Curling strategy analysis. Those two strategies are defined as actions of Markov Decision Process. By solving the model, the optimal strategy could be found at any in-game states.

### System Replacement Policy for A Partially Observable Markov Decision Process Model

• Kim, Chang-Eun
• Journal of Korean Institute of Industrial Engineers
• /
• v.16 no.2
• /
• pp.1-9
• /
• 1990
• The control of deterioration processes for which only incomplete state information is available is examined in this study. When the deterioration is governed by a Markov process, such processes are known as Partially Observable Markov Decision Processes (POMDP) which eliminate the assumption that the state or level of deterioration of the system is known exactly. This research investigates a two state partially observable Markov chain in which only deterioration can occur and for which the only actions possible are to replace or to leave alone. The goal of this research is to develop a new jump algorithm which has the potential for solving system problems dealing with continuous state space Markov chains.

### Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

• Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
• KSII Transactions on Internet and Information Systems (TIIS)
• /
• v.7 no.5
• /
• pp.1036-1057
• /
• 2013
• Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

### An Energy-Efficient Transmission Strategy for Wireless Sensor Networks (무선 센서 네트워크에서 에너지 효율적인 전송 방안에 관한 연구)

• Phan, Van Ca;Kim, Jeong-Geun
• Journal of Internet Computing and Services
• /
• v.10 no.3
• /
• pp.85-94
• /
• 2009
• In this work we propose an energy-efficient transmission strategy for wireless sensor networks that operate in a strict energy-constrained environment. Our transmission algorithm consists of two components: a binary-decision based transmission and a channel-aware backoff adjustment. In the binary-decision based transmission, we obtain the optimum threshold for successful transmission via Markov decision process (MDP) formulation. A channel-aware backoff adjustment, the second component of our proposal, is introduced to favor sensor nodes seeing better channel in terms of transmission priority. Extensive simulations are performed to verify the performance of our proposal over fading wireless channels.

### The Minimum-cost Network Selection Scheme to Guarantee the Periodic Transmission Opportunity in the Multi-band Maritime Communication System (멀티밴드 해양통신망에서 전송주기를 보장하는 최소 비용의 망 선택 기법)

• Cho, Ku-Min;Yun, Chang-Ho;Kang, Chung-G
• The Journal of Korean Institute of Communications and Information Sciences
• /
• v.36 no.2A
• /
• pp.139-148
• /
• 2011
• This paper presents the minimum-cost network selection scheme which determines the transmission instance in the multi-band maritime communication system, so that the shipment-related real-time information can be transmitted within the maximum allowed period. The transmission instances and the corresponding network selection process are modeled by a Markov Decision Process (MDP), for the channel model in the 2-state Markov chain, which can be solved by stochastic dynamic programming. It derives the minimum-cost network selection rule, which can reduce the network cost significantly as compared with the straight-forward scheme with a periodic transmission.

### Decision-Tree-Based Markov Model for Phrase Break Prediction

• Kim, Sang-Hun;Oh, Seung-Shin
• ETRI Journal
• /
• v.29 no.4
• /
• pp.527-529
• /
• 2007
• In this paper, a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.

### Network Security Situation Assessment Method Based on Markov Game Model

• Li, Xi;Lu, Yu;Liu, Sen;Nie, Wei
• KSII Transactions on Internet and Information Systems (TIIS)
• /
• v.12 no.5
• /
• pp.2414-2428
• /
• 2018
• In order to solve the problem that the current network security situation assessment methods just focus on the attack behaviors, this paper proposes a kind of network security situation assessment method based on Markov Decision Process and Game theory. The method takes the Markov Game model as the core, and uses the 4 levels data fusion to realize the evaluation of the network security situation. In this process, the Nash equilibrium point of the game is used to determine the impact on the network security. Experiments show that the results of this method are basically consistent with the expert evaluation data. As the method takes full account of the interaction between the attackers and defenders, it is closer to reality, and can accurately assess network security situation.

### Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

• Chang, Hyeong-Soo
• International Journal of Control, Automation, and Systems
• /
• v.1 no.3
• /
• pp.358-367
• /
• 2003
• We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.