• 제목/요약/키워드: Markov-decision Process

검색결과 131건 처리시간 0.021초

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • 제1권3호
    • /
    • pp.358-367
    • /
    • 2003
  • We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

Seamless Mobility of Heterogeneous Networks Based on Markov Decision Process

  • Preethi, G.A.;Chandrasekar, C.
    • Journal of Information Processing Systems
    • /
    • 제11권4호
    • /
    • pp.616-629
    • /
    • 2015
  • A mobile terminal will expect a number of handoffs within its call duration. In the event of a mobile call, when a mobile node moves from one cell to another, it should connect to another access point within its range. In case there is a lack of support of its own network, it must changeover to another base station. In the event of moving on to another network, quality of service parameters need to be considered. In our study we have used the Markov decision process approach for a seamless handoff as it gives the optimum results for selecting a network when compared to other multiple attribute decision making processes. We have used the network cost function for selecting the network for handoff and the connection reward function, which is based on the values of the quality of service parameters. We have also examined the constant bit rate and transmission control protocol packet delivery ratio. We used the policy iteration algorithm for determining the optimal policy. Our enhanced handoff algorithm outperforms other previous multiple attribute decision making methods.

A MARKOV DECISION PROCESSES FORMULATION FOR THE LINEAR SEARCH PROBLEM

  • Balkhi, Z.T.;Benkherouf, L.
    • 한국경영과학회지
    • /
    • 제19권1호
    • /
    • pp.201-206
    • /
    • 1994
  • The linear search problem is concerned with finding a hiden target on the real line R. The position of the target governed by some probability distribution. It is desired to find the target in the least expected search time. This problem has been formulated as an optimization problem by a number of authors without making use of Markov Decision Process (MDP) theory. It is the aim of the paper to give a (MDP) formulation to the search problem which we feel is both natural and easy to follow.

  • PDF

RAM 파라미터와 비용을 고려한 설계대안 분석 프로그램 개발 (Development of Design Alternative Analysis Program Considering RAM Parameter and Cost)

  • 김한솔;최성대;허장욱
    • 한국기계가공학회지
    • /
    • 제18권6호
    • /
    • pp.1-8
    • /
    • 2019
  • Modern weapon systems are multifunctional, with capabilities for executing complex missions. However, they are required to be highly reliable, which increases their total cost of ownership. Because it is necessary to produce the best results within a limited budget, there is an increasing interest in development, acquisition, and maintenance costs. Consequently, there is a need for tools that calculate the lifecycle costs of weapons systems development to facilitate decision making. In this study, we propose a cost calculation function based on the Markov process simulator-a reliability, availability, and maintainability analysis tool developed by applying the Markov-Monte Carlo method-as an alternative to these requirements to facilitate decision-making in systems development.

Two-Dimensional POMDP-Based Opportunistic Spectrum Access in Time-Varying Environment with Fading Channels

  • Wang, Yumeng;Xu, Yuhua;Shen, Liang;Xu, Chenglong;Cheng, Yunpeng
    • Journal of Communications and Networks
    • /
    • 제16권2호
    • /
    • pp.217-226
    • /
    • 2014
  • In this research, we study the problem of opportunistic spectrum access (OSA) in a time-varying environment with fading channels, where the channel state is characterized by both channel quality and the occupancy of primary users (PUs). First, a finite-state Markov channel model is introduced to represent a fading channel. Second, by probing channel quality and exploring the activities of PUs jointly, a two-dimensional partially observable Markov decision process framework is proposed for OSA. In addition, a greedy strategy is designed, where a secondary user selects a channel that has the best-expected data transmission rate to maximize the instantaneous reward in the current slot. Compared with the optimal strategy that considers future reward, the greedy strategy brings low complexity and relatively ideal performance. Meanwhile, the spectrum sensing error that causes the collision between a PU and a secondary user (SU) is also discussed. Furthermore, we analyze the multiuser situation in which the proposed single-user strategy is adopted by every SU compared with the previous one. By observing the simulation results, the proposed strategy attains a larger throughput than the previous works under various parameter configurations.

두 계층 공급사슬 모형에서 발주정책에 대한 수요 변동성 영향 (Demand Variability Impact on the Replenishment Policy in a Two-Echelon Supply Chain Model)

  • 김은갑
    • 한국경영과학회지
    • /
    • 제29권3호
    • /
    • pp.111-127
    • /
    • 2004
  • We consider a supply chain model with a make-to-order production facility and a single supplier. The model we treat here is a special case of a two-echelon inventory model. Unlike classical two-echelon systems, the demand process at the supplier is affected by production process at the production facility as well as customer order arrival process. In this paper, we address that how the demand variability impacts on the optimal replenishment policy. To this end, we incorporate Erlang and phase-type demand distributions into the model. Formulating the model as a Markov decision problem, we investigate the structure of the optimal replenishment policy. We also implement a sensitivity analysis on the optimal policy and establish its monotonicity with respect to system cost parameters.

Partially Observable Markov Decision Process with Lagged Information over Infinite Horizon

  • Jeong, Byong-Ho;Kim, Soung-Hie
    • 한국경영과학회지
    • /
    • 제16권1호
    • /
    • pp.135-146
    • /
    • 1991
  • This paper shows the infinite horizon model of Partially Observable Markov Decision Process with lagged information. The lagged information is uncertain delayed observation of the process under control. Even though the optimal policy of the model exists, finding the optimal policy is very time consuming. Thus, the aim of this study is to find an .eplison.-optimal stationary policy minimizing the expected discounted total cost of the model. .EPSILON.- optimal policy is found by using a modified version of the well known policy iteration algorithm. The modification focuses to the value determination routine of the algorithm. Some properties of the approximation functions for the expected discounted cost of a stationary policy are presented. The expected discounted cost of a stationary policy is approximated based on these properties. A numerical example is also shown.

  • PDF

마르코프 결정 과정에서 시뮬레이션 기반 정책 개선의 효율성 향상을 위한 시뮬레이션 샘플 누적 방법 연구 (A Simulation Sample Accumulation Method for Efficient Simulation-based Policy Improvement in Markov Decision Process)

  • 황시랑;최선한
    • 한국멀티미디어학회논문지
    • /
    • 제23권7호
    • /
    • pp.830-839
    • /
    • 2020
  • As a popular mathematical framework for modeling decision making, Markov decision process (MDP) has been widely used to solve problem in many engineering fields. MDP consists of a set of discrete states, a finite set of actions, and rewards received after reaching a new state by taking action from the previous state. The objective of MDP is to find an optimal policy, that is, to find the best action to be taken in each state to maximize the expected discounted reward of policy (EDR). In practice, MDP is typically unknown, so simulation-based policy improvement (SBPI), which improves a given base policy sequentially by selecting the best action in each state depending on rewards observed via simulation, can be a practical way to find the optimal policy. However, the efficiency of SBPI is still a concern since many simulation samples are required to precisely estimate EDR for each action in each state. In this paper, we propose a method to select the best action accurately in each state using a small number of simulation samples, thereby improving the efficiency of SBPI. The proposed method accumulates the simulation samples observed in the previous states, so it is possible to precisely estimate EDR even with a small number of samples in the current state. The results of comparative experiments on the existing method demonstrate that the proposed method can improve the efficiency of SBPI.

강화학습을 이용한 이종 장비 토목 공정 계획 (Earthwork Planning via Reinforcement Learning with Heterogeneous Construction Equipment)

  • 지민기;박준건;김도형;정요한;박진규;문일철
    • 한국시뮬레이션학회논문지
    • /
    • 제27권1호
    • /
    • pp.1-13
    • /
    • 2018
  • 토목 공정 계획은 건설 공정 관리에서 중요한 과제 중 하나이다. 수학적 방법론에 기반을 둔 최적화 기법, 휴리스틱에 기반을 둔 최적화 기법 그리고 행위자 기반의 시뮬레이션 등의 방법론이 건설 공정 관리를 위해 적용되어왔다. 본 연구에서는 가상의 토목 공정 환경을 개발하고, 가상의 토목 공정 환경에서 강화학습을 이용한 시뮬레이션을 통해 토목 공정의 최적 경로를 찾는 방법을 제안하였다. 강화학습에 있어 본 연구에서는 상호작용 하며 서로 다른 행동을 하는 굴삭기와 트럭 에이전트들 에 대해 순차적 학습과 독립적 학습에 기반을 둔 두 가지의 Markov decision process (MDP)를 사용하였다. 가상의 토목 공정 환경에서 두 가지 방법 모두 최적에 가까운 토목 공정 계획을 만들어 낼 수 있음을 시뮬레이션 결과에 따라 알 수 있었으며, 이 계획은 건설 자동화의 기초가 될 수 있을 것이다.

Markov chain-based mass estimation method for loose part monitoring system and its performance

  • Shin, Sung-Hwan;Park, Jin-Ho;Yoon, Doo-Byung;Han, Soon-Woo;Kang, To
    • Nuclear Engineering and Technology
    • /
    • 제49권7호
    • /
    • pp.1555-1562
    • /
    • 2017
  • A loose part monitoring system is used to identify unexpected loose parts in a nuclear reactor vessel or steam generator. It is still necessary for the mass estimation of loose parts, one function of a loose part monitoring system, to develop a new method due to the high estimation error of conventional methods such as Hertz's impact theory and the frequency ratio method. The purpose of this study is to propose a mass estimation method using a Markov decision process and compare its performance with a method using an artificial neural network model proposed in a previous study. First, how to extract feature vectors using discrete cosine transform was explained. Second, Markov chains were designed with codebooks obtained from the feature vector. A 1/8-scaled mockup of the reactor vessel for OPR1000 was employed, and all used signals were obtained by impacting its surface with several solid spherical masses. Next, the performance of mass estimation by the proposed Markov model was compared with that of the artificial neural network model. Finally, it was investigated that the proposed Markov model had matching error below 20% in mass estimation. That was a similar performance to the method using an artificial neural network model and considerably improved in comparison with the conventional methods.