• Title/Summary/Keyword: markov decision problem

Search Result 69, Processing Time 0.022 seconds

Fault- Tolerant Tasking and Guidance of an Airborne Location Sensor Network

  • Wu, N.Eva;Guo, Yan;Huang, Kun;Ruschmann, Matthew C.;Fowler, Mark L.
    • International Journal of Control, Automation, and Systems
    • /
    • v.6 no.3
    • /
    • pp.351-363
    • /
    • 2008
  • This paper is concerned with tasking and guidance of networked airborne sensors to achieve fault-tolerant sensing. The sensors are coordinated to locate hostile transmitters by intercepting and processing their signals. Faults occur when some sensor-carrying vehicles engaged in target location missions are lost. Faults effectively change the network architecture and therefore degrade the network performance. The first objective of the paper is to optimally allocate a finite number of sensors to targets to maximize the network life and availability. To that end allocation policies are solved from relevant Markov decision problems. The sensors allocated to a target must continue to adjust their trajectories until the estimate of the target location reaches a prescribed accuracy. The second objective of the paper is to establish a criterion for vehicle guidance for which fault-tolerant sensing is achieved by incorporating the knowledge of vehicle loss probability, and by allowing network reconfiguration in the event of loss of vehicles. Superior sensing performance in terms of location accuracy is demonstrated under the established criterion.

Applying the Bi-level HMM for Robust Voice-activity Detection

  • Hwang, Yongwon;Jeong, Mun-Ho;Oh, Sang-Rok;Kim, Il-Hwan
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.1
    • /
    • pp.373-377
    • /
    • 2017
  • This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bi-level hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

Sparse Data Cleaning using Multiple Imputations

  • Jun, Sung-Hae;Lee, Seung-Joo;Oh, Kyung-Whan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.1
    • /
    • pp.119-124
    • /
    • 2004
  • Real data as web log file tend to be incomplete. But we have to find useful knowledge from these for optimal decision. In web log data, many useful things which are hyperlink information and web usages of connected users may be found. The size of web data is too huge to use for effective knowledge discovery. To make matters worse, they are very sparse. We overcome this sparse problem using Markov Chain Monte Carlo method as multiple imputations. This missing value imputation changes spare web data to complete. Our study may be a useful tool for discovering knowledge from data set with sparseness. The more sparseness of data in increased, the better performance of MCMC imputation is good. We verified our work by experiments using UCI machine learning repository data.

A Study on the Criteria to Decide the Number of Aircrafts Considering Operational Characteristics (항공기 운용 특성을 고려한 적정 운용 대수 산정 기준 연구)

  • Son, Young-Su;Kim, Seong-Woo;Yoon, Bong-Kyoo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.17 no.1
    • /
    • pp.41-49
    • /
    • 2014
  • In this paper, we consider a method to access the number of aircraft requirement which is a strategic variable in national security. This problem becomes more important considering the F-X and KF-X project in ROKAF. Traditionally, ATO(Air Tasking Order) and fighting power index have been used to evaluate the number of aircrafts required in ROKAF. However, those methods considers static aspect of aircraft requirement. This paper deals with a model to accommodate dynamic feature of aircraft requirement using absorbing Markov chain. In conclusion, we suggest a dynamic model to evaluate the number of aircrafts required with key decision variables such as destroying rate, failure rate and repair rate.

An Inspection-Maintenance Policy for a System with Various Types of Maintenance (다수의 보수형태를 갖는 시스템에서의 검사.보수정책)

  • 이창훈;홍성희
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.6 no.2
    • /
    • pp.7-11
    • /
    • 1981
  • An inspection-maintenance policy is investigated for a system having various states. A policy is characterized by the type of maintenance and the next inspection time. Maintenance actions are classified into various types according to the depth of maintenance. Policy evaluation criterion is the expected cost accumulated up to the failure of the system. The problem is formulated as a Markov decision process and an optimal policy is found by using a policy improvement procedure. A numerical example illustrates the policy for a system having five states.

  • PDF

Learning Multi-Character Competition in Markov Games (마르코프 게임 학습에 기초한 다수 캐릭터의 경쟁적 상호작용 애니메이션 합성)

  • Lee, Kang-Hoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.2
    • /
    • pp.9-17
    • /
    • 2009
  • Animating multiple characters to compete with each other is an important problem in computer games and animation films. However, it remains difficult to simulate strategic competition among characters because of its inherent complex decision process that should be able to cope with often unpredictable behavior of opponents. We adopt a reinforcement learning method in Markov games to action models built from captured motion data. This enables two characters to perform globally optimal counter-strategies with respect to each other. We also extend this method to simulate competition between two teams, each of which can consist of an arbitrary number of characters. We demonstrate the usefulness of our approach through various competitive scenarios, including playing-tag, keeping-distance, and shooting.

  • PDF

Hierarchical Power Management Architecture and Optimal Local Control Policy for Energy Efficient Networks

  • Wei, Yifei;Wang, Xiaojun;Fialho, Leonardo;Bruschi, Roberto;Ormond, Olga;Collier, Martin
    • Journal of Communications and Networks
    • /
    • v.18 no.4
    • /
    • pp.540-550
    • /
    • 2016
  • Since energy efficiency has become a significant concern for network infrastructure, next-generation network devices are expected to have embedded advanced power management capabilities. However, how to effectively exploit the green capabilities is still a big challenge, especially given the high heterogeneity of devices and their internal architectures. In this paper, we introduce a hierarchical power management architecture (HPMA) which represents physical components whose power can be monitored and controlled at various levels of a device as entities. We use energy aware state (EAS) as the power management setting mode of each device entity. The power policy controller is capable of getting information on how many EASes of the entity are manageable inside a device, and setting a certain EAS configuration for the entity. We propose the optimal local control policy which aims to minimize the router power consumption while meeting the performance constraints. A first-order Markov chain is used to model the statistical features of the network traffic load. The dynamic EAS configuration problem is formulated as a Markov decision process and solved using a dynamic programming algorithm. In addition, we demonstrate a reference implementation of the HPMA and EAS concept in a NetFPGA frequency scaled router which has the ability of toggling among five operating frequency options and/or turning off unused Ethernet ports.

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models (복잡계망 모델을 사용한 강화 학습 상태 공간의 효율적인 근사)

  • Yi, Seung-Joon;Eom, Jae-Hong;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.6
    • /
    • pp.479-490
    • /
    • 2009
  • A number of temporal abstraction approaches have been suggested so far to handle the high computational complexity of Markov decision problems (MDPs). Although the structure of temporal abstraction can significantly affect the efficiency of solving the MDP, to our knowledge none of current temporal abstraction approaches explicitly consider the relationship between topology and efficiency. In this paper, we first show that a topological measurement from complex network literature, mean geodesic distance, can reflect the efficiency of solving MDP. Based on this, we build an incremental method to systematically build temporal abstractions using a network model that guarantees a small mean geodesic distance. We test our algorithm on a realistic 3D game environment, and experimental results show that our model has subpolynomial growth of mean geodesic distance according to problem size, which enables efficient solving of resulting MDP.

Reinforcement Learning-based Dynamic Weapon Assignment to Multi-Caliber Long-Range Artillery Attacks (다종 장사정포 공격에 대한 강화학습 기반의 동적 무기할당)

  • Hyeonho Kim;Jung Hun Kim;Joohoe Kong;Ji Hoon Kyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.42-52
    • /
    • 2022
  • North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel's "Iron Dome", designed to protect against North Korea's arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea's multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.

Computation Offloading with Resource Allocation Based on DDPG in MEC

  • Sungwon Moon;Yujin Lim
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.226-238
    • /
    • 2024
  • Recently, multi-access edge computing (MEC) has emerged as a promising technology to alleviate the computing burden of vehicular terminals and efficiently facilitate vehicular applications. The vehicle can improve the quality of experience of applications by offloading their tasks to MEC servers. However, channel conditions are time-varying due to channel interference among vehicles, and path loss is time-varying due to the mobility of vehicles. The task arrival of vehicles is also stochastic. Therefore, it is difficult to determine an optimal offloading with resource allocation decision in the dynamic MEC system because offloading is affected by wireless data transmission. In this paper, we study computation offloading with resource allocation in the dynamic MEC system. The objective is to minimize power consumption and maximize throughput while meeting the delay constraints of tasks. Therefore, it allocates resources for local execution and transmission power for offloading. We define the problem as a Markov decision process, and propose an offloading method using deep reinforcement learning named deep deterministic policy gradient. Simulation shows that, compared with existing methods, the proposed method outperforms in terms of throughput and satisfaction of delay constraints.