• Title/Summary/Keyword: Markov-decision Process

Search Result 130, Processing Time 0.025 seconds

A Study on M / M (a, b ; ${\mu}_k$) / 1 Batch Service Queueing Model (M/M(a, b ; ${\mu}_k$)/1 배치 서비스 대기모델에 대한 연구)

  • Lee, Hwa-Ki;Chung, Kyung-Il
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.21 no.3
    • /
    • pp.345-356
    • /
    • 1995
  • The aim of this paper is to analyze the batch service queueing model M/M(a, b ; ${\mu}_k/1$) under general bulk service rule with mean service rate ${\mu}_k$ for a batch of k units, where $a{\leq}k{\leq}b$. This queueing model consists of the two-dimensional state space so that it is characterized by two-dimensional state Markov process. The steady-state solution and performane measure of this process are derived by using Matrix Geometric method. Meanwhile, a new approach is suggested to calculate the two-dimensional traffic density R which is used to obtain the steady-state solution. In addition, to determine the optimal service initiation threshold a, a decision model of this queueing system is developed evaluating cost of service per batch and cost of waiting per customer. In a job order production system, the decision-making procedure presented in this paper can be applicable to determining when production should be started.

  • PDF

Multiple Behavior s Learning and Prediction in Unknown Environment

  • Song, Wei;Cho, Kyung-Eun;Um, Ky-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1820-1831
    • /
    • 2010
  • When interacting with unknown environments, an autonomous agent needs to decide which action or action order can result in a good state and determine the transition probability based on the current state and the action taken. The traditional multiple sequential learning model requires predefined probability of the states' transition. This paper proposes a multiple sequential learning and prediction system with definition of autonomous states to enhance the automatic performance of existing AI algorithms. In sequence learning process, the sensed states are classified into several group by a set of proposed motivation filters to reduce the learning computation. In prediction process, the learning agent makes a decision based on the estimation of each state's cost to get a high payoff from the given environment. The proposed learning and prediction algorithms heightens the automatic planning of the autonomous agent for interacting with the dynamic unknown environment. This model was tested in a virtual library.

Intelligent Update of Environment Model in Dynamic Environments through Generalized Stochastic Petri Net (추계적 페트리넷을 통한 동적 환경에서의 지능적인 환경정보의 갱신)

  • Park, Joong-Tae;Lee, Yong-Ju;Song, Jae-Bok
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.181-183
    • /
    • 2006
  • This paper proposes an intelligent decision framework for update of the environment model using GSPN(generalized stochastic petri nets). The GSPN has several advantages over direct use of the Markov Process. The modeling, analysis, and performance evaluation are conducted on the mathematical basis. By adopting the probabilistic approach, our decision framework helps the robot to decide the time to update the map. The robot navigates autonomously for a long time in dynamic environments. Experimental results show that the proposed scheme is useful for service robots which work semi-permanently and improves dependability of navigation in dynamic environments.

  • PDF

Energy-Saving Oriented On/Off Strategies in Heterogeneous Networks : an Asynchronous Approach with Dynamic Traffic Variations

  • Tang, Lun;Wang, Weili;Chen, Qianbin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5449-5464
    • /
    • 2018
  • Recent works have validated the possibility of reducing the energy consumption in wireless heterogeneous networks, achieved by switching on/off some base stations (BSs) dynamically. In this paper, to realize energy conservation, the discrete time Markov Decision Process (DTMDP) is developed to match up the BS switching operations with the traffic load variations. Then, an asynchronous decision-making algorithm, which is based on the Bellman equation and the on/off priorities of the BSs, is firstly put forward and proved to be optimal in this paper. Through reducing the state and action space during one decision, the proposed asynchronous algorithm can avoid the "curse of dimensionality" occurred in DTMDP frequently. Finally, numerical simulations are conducted to validate the effectiveness and advantages of the proposed asynchronous on/off strategies.

A Selectively Cumulative Sum(S-CUSUM) Control Chart (선택적 누적합(S-CUSUM) 관리도)

  • Lim, Tae-Jin
    • Journal of Korean Society for Quality Management
    • /
    • v.33 no.3
    • /
    • pp.126-134
    • /
    • 2005
  • This paper proposes a selectively cumulative sum(S-CUSUM) control chart for detecting shifts in the process mean. The basic idea of the S-CUSUM chart is to accumulate previous samples selectively in order to increase the sensitivity. The S-CUSUM chart employs a threshold limit to determine whether to accumulate previous samples or not. Consecutive samples with control statistics out of the threshold limit are to be accumulated to calculate a standardized control statistic. If the control statistic falls within the threshold limit, only the next sample is to be used. During the whole sampling process, the S-CUSUM chart produces an 'out-of-control' signal either when any control statistic falls outside the control limit or when L -consecutive control statistics fall outside the threshold limit. The number L is a decision variable and is called a 'control length'. A Markov chain approach is employed to describe the S-CUSUM sampling process. Formulae for the steady state probabilities and the Average Run Length(ARL) during an in-control state are derived in closed forms. Some properties useful for designing statistical parameters are also derived and a statistical design procedure for the S-CUSUM chart is proposed. Comparative studies show that the proposed S-CUSUM chart is uniformly superior to the CUSUM chart or the Exponentially Weighted Moving Average(EWMA) chart with respect to the ARL performance.

Performance-based remaining life assessment of reinforced concrete bridge girders

  • Anoop, M.B.;Rao, K. Balaji;Raghuprasad, B.K.
    • Computers and Concrete
    • /
    • v.18 no.1
    • /
    • pp.69-97
    • /
    • 2016
  • Performance-based remaining life assessment of reinforced concrete bridge girders, subject to chloride-induced corrosion of reinforcement, is addressed in this paper. Towards this, a methodology that takes into consideration the human judgmental aspects in expert decision making regarding condition state assessment is proposed. The condition of the bridge girder is specified by the assignment of a condition state from a set of predefined condition states, considering both serviceability- and ultimate- limit states, and, the performance of the bridge girder is described using performability measure. A non-homogeneous Markov chain is used for modelling the stochastic evolution of condition state of the bridge girder with time. The thinking process of the expert in condition state assessment is modelled within a probabilistic framework using Brunswikian theory and probabilistic mental models. The remaining life is determined as the time over which the performance of the girder is above the required performance level. The usefulness of the methodology is illustrated through the remaining life assessment of a reinforced concrete T-beam bridge girder.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.

Learning Multi-Character Competition in Markov Games (마르코프 게임 학습에 기초한 다수 캐릭터의 경쟁적 상호작용 애니메이션 합성)

  • Lee, Kang-Hoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.2
    • /
    • pp.9-17
    • /
    • 2009
  • Animating multiple characters to compete with each other is an important problem in computer games and animation films. However, it remains difficult to simulate strategic competition among characters because of its inherent complex decision process that should be able to cope with often unpredictable behavior of opponents. We adopt a reinforcement learning method in Markov games to action models built from captured motion data. This enables two characters to perform globally optimal counter-strategies with respect to each other. We also extend this method to simulate competition between two teams, each of which can consist of an arbitrary number of characters. We demonstrate the usefulness of our approach through various competitive scenarios, including playing-tag, keeping-distance, and shooting.

  • PDF

Fire detection in video surveillance and monitoring system using Hidden Markov Models (영상감시시스템에서 은닉마코프모델을 이용한 불검출 방법)

  • Zhu, Teng;Kim, Jeong-Hyun;Kang, Dong-Joong;Kim, Min-Sung;Lee, Ju-Seoup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.35-38
    • /
    • 2009
  • The paper presents an effective method to detect fire in video surveillance and monitoring system. The main contribution of this work is that we successfully use the Hidden Markov Models in the process of detecting the fire with a few preprocessing steps. First, the moving pixels detected from image difference, the color values obtained from the fire flames, and their pixels clustering are applied to obtain the image regions labeled as fire candidates; secondly, utilizing massive training data, including fire videos and non-fire videos, creates the Hidden Markov Models of fire and non-fire, which are used to make the final decision that whether the frame of the real-time video has fire or not in both temporal and spatial analysis. Experimental results demonstrate that it is not only robust but also has a very low false alarm rate, furthermore, on the ground that the HMM training which takes up the most time of our whole procedure is off-line calculated, the real-time detection and alarm can be well implemented when compared with the other existing methods.

Approximate Dynamic Programming Based Interceptor Fire Control and Effectiveness Analysis for M-To-M Engagement (근사적 동적계획을 활용한 요격통제 및 동시교전 효과분석)

  • Lee, Changseok;Kim, Ju-Hyun;Choi, Bong Wan;Kim, Kyeongtaek
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.50 no.4
    • /
    • pp.287-295
    • /
    • 2022
  • As low altitude long-range artillery threat has been strengthened, the development of anti-artillery interception system to protect assets against its attacks will be kicked off. We view the defense of long-range artillery attacks as a typical dynamic weapon target assignment (DWTA) problem. DWTA is a sequential decision process in which decision making under future uncertain attacks affects the subsequent decision processes and its results. These are typical characteristics of Markov decision process (MDP) model. We formulate the problem as a MDP model to examine the assignment policy for the defender. The proximity of the capital of South Korea to North Korea border limits the computation time for its solution to a few second. Within the allowed time interval, it is impossible to compute the exact optimal solution. We apply approximate dynamic programming (ADP) approach to check if ADP approach solve the MDP model within processing time limit. We employ Shoot-Shoot-Look policy as a baseline strategy and compare it with ADP approach for three scenarios. Simulation results show that ADP approach provide better solution than the baseline strategy.