• Title/Summary/Keyword: Optimal stochastic Policy

Search Result 51, Processing Time 0.027 seconds

Application of Recent Approximate Dynamic Programming Methods for Navigation Problems (주행문제를 위한 최신 근사적 동적계획법의 적용)

  • Min, Dae-Hong;Jung, Keun-Woo;Kwon, Ki-Young;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.737-742
    • /
    • 2011
  • Navigation problems include the task of determining the control input under various constraints for systems such as mobile robots subject to uncertain disturbance. Such tasks can be modeled as constrained stochastic control problems. In order to solve these control problems, one may try to utilize the dynamic programming(DP) methods which rely on the concept of optimal value function. However, in most real-world problems, this trial would give us many difficulties; for examples, the exact system model may not be known; the computation of the optimal control policy may be impossible; and/or a huge amount of computing resource may be in need. As a strategy to overcome the difficulties of DP, one can utilize ADP(approximate dynamic programming) methods, which find suboptimal control policies resorting to approximate value functions. In this paper, we apply recently proposed ADP methods to a class of navigation problems having complex constraints, and observe the resultant performance characteristics.

A Comparison of Admission Controls of Reservation Requests with Callable Products (임의상환가능 상품 도입하의 예약 요청 승인 방법 비교)

  • Lee, Haeng-Ju
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.127-133
    • /
    • 2019
  • A callable product is one of service derivatives using options to generate demand and reduce risk. This paper compares two booking admission controls for callable products, the online and the batch admission controls. To this end, the paper computes the optimal booking policy by using the backward dynamic programming and the stochastic optimization method. Intuitively, the provider should outperform under the batch control by utilizing demand information. The contribution of the paper is to show that the two controls are equivalent in terms of the booking strategy and the expected profit, which enables the provider to keep its current control method. The paper develops the closed-form solutions for the three fare classes. The future work is to extend the result to the model with complicated fare structures.

A Continuous Review(s, S) Inventory Model in which Depletion is Due to Demand and Loss of Units

  • Choi, Jin-Yeong;Kim, Man-Sik
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.11 no.1
    • /
    • pp.33-39
    • /
    • 1985
  • A stochastic model for an inventory system in which depletion of stock takes place due to random demand as well as random loss of items is studied under the assumption that the intervals between successive unit demands, as well as those between successive unit losses are independently and identically distributed random variables having negative exponential distribution with respective parameters. We have derived the steady state probability distribution of the stock level assuming instantaneous delivery of order under (s, S) inventory policy. Also we have derived total expected cost expression and the necessary conditions to be satisfied for an optimal solution.

  • PDF

A Study on the Inventory Model with Partial Backorders under the Lead Time Uncertainty (조달기간(調達期間)이 불확실(不確實)한 상황하에서의 부분부(部分負) 재고모형(在庫模型)에 관한 연구(硏究))

  • Lee, Kang-Woo;Lee, Sang-Do
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.17 no.1
    • /
    • pp.51-58
    • /
    • 1991
  • This paper presents a single-echelon, single item, stochastic lead time and static demand inventory model for situations in which, during the stockout period, a fraction ${\beta}$ of the demand is backordered and the remaining fraction $(1-{\beta})$ is lost. In this situations, an objective function representing the average annual cost of inventory system is obtained by defining a time-proportional backorder cost and a fixed penalty cost per unit lost. The optimal operating policy variables minimizing the average annual cost are calculated iteratively. At the extremet ${\beta}=1$, the model presented reduces to the usual backorder case. A numerical example is solved to illustrate the algorithm developed.

  • PDF

Computation Offloading with Resource Allocation Based on DDPG in MEC

  • Sungwon Moon;Yujin Lim
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.226-238
    • /
    • 2024
  • Recently, multi-access edge computing (MEC) has emerged as a promising technology to alleviate the computing burden of vehicular terminals and efficiently facilitate vehicular applications. The vehicle can improve the quality of experience of applications by offloading their tasks to MEC servers. However, channel conditions are time-varying due to channel interference among vehicles, and path loss is time-varying due to the mobility of vehicles. The task arrival of vehicles is also stochastic. Therefore, it is difficult to determine an optimal offloading with resource allocation decision in the dynamic MEC system because offloading is affected by wireless data transmission. In this paper, we study computation offloading with resource allocation in the dynamic MEC system. The objective is to minimize power consumption and maximize throughput while meeting the delay constraints of tasks. Therefore, it allocates resources for local execution and transmission power for offloading. We define the problem as a Markov decision process, and propose an offloading method using deep reinforcement learning named deep deterministic policy gradient. Simulation shows that, compared with existing methods, the proposed method outperforms in terms of throughput and satisfaction of delay constraints.

Determination of the profit-maximizing configuration for the modular cell manufacturing system using stochastic process (실시간 고장포용 생산시스템의 적정 성능 유지를 위한 최적 설계 기법에 관한 연구)

  • Park, Seung-Kyu
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.5 no.5
    • /
    • pp.614-621
    • /
    • 1999
  • In this paper, the analytical appproaches are presented for jointly determining the profit-miximizing configuration of the fault-tolerance real time modular cell manufacturing system. The transient(time-dependent) analysis of Markovian models is firstly applied to modular cell manufacturing system from a performability viewpoint whose modeling advantage lies in its ability to express the performance that truly matters - the user's perception of it - as well as various performance measures compositely in the context of application. The modular cells are modeled with hybrid decomposition method and then availability measures such as instantaneous availability, interval availability, expected cumulative operational time are evaluated as special cases of performability. In addition to this evaluation, sensitivity analysis of the entire manufacturing system as well as each machining cell is performed, from which the time of a major repair policy and the optimal configuration among the alternative configurations of the system can be determined. Secondly, the recovery policies from the machine failures by computing the minimal number of redundant machines and also from the task failures by computing the minimum number of tasks equipped with detection schemes of task failure and reworked upon failure detection, to meet the timing requirements are optimized. Some numerical examples are presented to demonstrate the effectiveness of the work.

  • PDF

Optimal Lot-sizing and Pricing with Markdown for a Newsvendor Problem

  • Chen, Jen-Ming;Chen, Yi-Shen;Chien, Mei-Chen
    • Industrial Engineering and Management Systems
    • /
    • v.7 no.3
    • /
    • pp.257-265
    • /
    • 2008
  • This paper deals with the joint decisions on pricing and ordering for a monopolistic retailer who sells perishable goods with a fixed lifetime or demand period. The newsvendor-typed problem is formulated as a two-period inventory system where the first period represents the inventory of fresh or new-arrival items and the second period represents the inventory of items that are older but still usable. Demand may be for either fresh items or for somewhat older items that exhibit physical decay or deterioration. The retailer is allowed to adjust the selling price of the deteriorated items in the second period, which stimulates demand and reduces excess season-end or stale inventory. This paper develops a stochastic dynamic programming model that solves the problem of preseason decisions on ordering-pricing and a within-season decision on markdown pricing. We also develop a fixed-price model as a benchmark against the dual-price dynamic model. To illustrate the effect of the dual-price policy on expected profit, we conduct a comparative study between the two models. Extension to a generalized multi-period model is also discussed.

The Evaluation of Long-Term Generation Portfolio Considering Uncertainty (불확실성을 고려한 장기 전원 포트폴리오의 평가)

  • Chung, Jae-Woo;Min, Dai-Ki
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.37 no.3
    • /
    • pp.135-150
    • /
    • 2012
  • This paper presents a portfolio model for a long-term power generation mix problem. The proposed portfolio model evaluates generation mix by considering the tradeoffs between the expected cost for power generation and its variability. Unlike conventional portfolio models measuring variance, we introduce Conditional Value-at-Risk (CVaR) in designing the variability with aims to considering events that are enormously expensive but are rare such as nuclear power plant accidents. Further, we consider uncertainties associated with future electricity demand, fuel prices and their correlations, and capital costs for power plant investments. To obtain an objective generation by each energy source, we employ the sample average approximation method that approximates the stochastic objective function by taking the average of large sample values so that provides asymptotic convergence of optimal solutions. In addition, the method includes Monte Carlo simulation techniques in generating random samples from multivariate distributions. Applications of the proposed model and method are demonstrated through a case study of an electricity industry with nuclear, coal, oil (OCGT), and LNG (CCGT) in South Korea.

Flexible operation and maintenance optimization of aging cyber-physical energy systems by deep reinforcement learning

  • Zhaojun Hao;Francesco Di Maio;Enrico Zio
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1472-1479
    • /
    • 2024
  • Cyber-Physical Energy Systems (CPESs) integrate cyber and hardware components to ensure a reliable and safe physical power production and supply. Renewable Energy Sources (RESs) add uncertainty to energy demand that can be dealt with flexible operation (e.g., load-following) of CPES; at the same time, scenarios that could result in severe consequences due to both component stochastic failures and aging of the cyber system of CPES (commonly overlooked) must be accounted for Operation & Maintenance (O&M) planning. In this paper, we make use of Deep Reinforcement Learning (DRL) to search for the optimal O&M strategy that, not only considers the actual system hardware components health conditions and their Remaining Useful Life (RUL), but also the possible accident scenarios caused by the failures and the aging of the hardware and the cyber components, respectively. The novelty of the work lies in embedding the cyber aging model into the CPES model of production planning and failure process; this model is used to help the RL agent, trained with Proximal Policy Optimization (PPO) and Imitation Learning (IL), finding the proper rejuvenation timing for the cyber system accounting for the uncertainty of the cyber system aging process. An application is provided, with regards to the Advanced Lead-cooled Fast Reactor European Demonstrator (ALFRED).

Analysis of Reinforcement Learning Methods for BS Switching Operation (기지국 상태 조정을 위한 강화 학습 기법 분석)

  • Park, Hyebin;Lim, Yujin
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.2
    • /
    • pp.351-358
    • /
    • 2018
  • Reinforcement learning is a machine learning method which aims to determine a policy to get optimal actions in dynamic and stochastic environments. But reinforcement learning has high computational complexity and needs a lot of time to get solution, so it is not easily applicable to uncertain and continuous environments. To tackle the complexity problem, AC (actor-critic) method is used and it separates an action-value function into a value function and an action decision policy. Also, in transfer learning method, the knowledge constructed in one environment is adapted to another environment, so it reduces the time to learn in a reinforcement learning method. In this paper, we present AC method and transfer learning method to solve the problem of a reinforcement learning method. Finally, we analyze the case study which a transfer learning method is used to solve BS(base station) switching problem in wireless access networks.