• Title/Summary/Keyword: POMDP

Search Result 25, Processing Time 0.025 seconds

POMDP-based Human-Robot Interaction Behavior Model (POMDP 기반 사용자-로봇 인터랙션 행동 모델)

  • Kim, Jong-Cheol
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.20 no.6
    • /
    • pp.599-605
    • /
    • 2014
  • This paper presents the interactive behavior modeling method based on POMDP (Partially Observable Markov Decision Process) for HRI (Human-Robot Interaction). HRI seems similar to conversational interaction in point of interaction between human and a robot. The POMDP has been popularly used in conversational interaction system. The POMDP can efficiently handle uncertainty of observable variables in conversational interaction system. In this paper, the input variables of the proposed conversational HRI system in POMDP are the input information of sensors and the log of used service. The output variables of system are the name of robot behaviors. The robot behavior presents the motion occurred from LED, LCD, Motor, sound. The suggested conversational POMDP-based HRI system was applied to an emotional robot KIBOT. In the result of human-KIBOT interaction, this system shows the flexible robot behavior in real world.

Case Studies on Planning and Learning for Large-Scale CGFs with POMDPs through Counterfire and Mechanized Infantry Scenarios (대화력전 및 기계화 보병 시나리오를 통한 대규모 가상군의 POMDP 행동계획 및 학습 사례연구)

  • Lee, Jongmin;Hong, Jungpyo;Park, Jaeyoung;Lee, Kanghoon;Kim, Kee-Eung;Moon, Il-Chul;Park, Jae-Hyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.6
    • /
    • pp.343-349
    • /
    • 2017
  • Combat modeling and simulation (M&S) of large-scale computer generated forces (CGFs) enables the development of even the most sophisticated strategy of combat warfare and the efficient facilitation of a comprehensive simulation of the upcoming battle. The DEVS-POMDP framework is proposed where the DEVS framework describing the explicit behavior rules in military doctrines, and POMDP model describing the autonomous behavior of the CGFs are hierarchically combined to capture the complexity of realistic world combat modeling and simulation. However, it has previously been well documented that computing the optimal policy of a POMDP model is computationally demanding. In this paper, we show that not only can the performance of CGFs be improved by an efficient POMDP tree search algorithm but CGFs are also able to conveniently learn the behavior model of the enemy through case studies in the scenario of counterfire warfare and the scenario of a mechanized infantry brigade's offensive operations.

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

A Case Study on Modeling Computer Generated Forces based on Factored POMDPs (Factored POMDP를 이용한 가상군의 자율행위 모델링 사례연구)

  • Lee, Kang-Hoon;Lim, Hee-Jin;Kim, Kee-Eung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.333-335
    • /
    • 2012
  • 가상군의 자율행위 모델링은 전장모의모델링 시스템의 성능을 결정하는 주요한 요소이다. 불확실한 상황을 확률적으로 고려하여 최적의 의사결정을 가능하게 하는 POMDP (partially observable Markov decision process) 모델은 가상군의 자율행위 모델링에 있어서 매우 자연스러운 프레임워크이다. 그러나 POMDP 모델의 높은 계산복잡도로 인한 최적 행동정책 계산의 어려움은 POMDP 모델을 이용한 가상 군의 자율행위 모델링을 저해하는 요소이다. 본 논문에서는 대규모 가상군의 자율행위 모델링을 위해 factored POMDP 모델을 이용한다. 그리고 "Hasty Defense" 사례연구를 통해 그 효과를 확인한다.

Labeling Q-Learning for Maze Problems with Partially Observable States

  • Lee, Hae-Yeon;Hiroyuki Kamaya;Kenich Abe
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.489-489
    • /
    • 2000
  • Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.

  • PDF

Point-Based Value Iteration for Constrained POMDPs (제약을 갖는 POMDP를 위한 점-기반 가치 반복 알고리즘)

  • Kim, Dong-Ho;Lee, Jae-Song;Kim, Kee-Eung;Poupart, Pascal
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.286-289
    • /
    • 2011
  • 제약을 갖는 부분 관찰 의사결정 과정(Constrained Partially Observable Markov Decision Process; CPOMDP)는 정책이 제약(constraint)를 만족하면서 가치 함수를 최적화하도록 일반적인 부분 관찰 의사결정과정(POMDP)을 확장한 모델이다. CPOMDP는 제한된 자원을 가지거나 여러 개의 목적 함수를 가지는 문제를 자연스럽게 모델링할 수 있기 때문에 일반적인 POMDP에 비해 더 실용적인 장점을 가진다. 본 논문에서는 CPOMDP의 확률적 최적 정책 및 근사 최적 정책을 계산할 수 있는 최적 및 근사 동적 프로그래밍 알고리즘을 제안한다. 최적 알고리즘은 동적 프로그래밍의 각 단계마다 미니맥스 이차 제약 계획 문제를 계산해야 하는 반면에 근사 알고리즘은 선형 계획 문제만을 필요로 하는 점-기반(point-based) 가치 업데이트를 이용한다. 실험 결과, 확률적 정책이 결정적(deterministic) 정책보다 더 나은 성능을 보이며, 근사 알고리즘을 통해 계산 시간을 줄일 수 있음을 보였다.

Multimodal Dialog System Using Hidden Information State Dialog Manager (Hidden Information State 대화 관리자를 이용한 멀티모달 대화시스템)

  • Kim, Kyung-Duk;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.29-32
    • /
    • 2007
  • This paper describes a multimodal dialog system that uses Hidden Information State (HIS) method to manage the human-machine dialog. HIS dialog manager is a variation of classic partially observable Markov decision process (POMDP), which provides one of the stochastic dialog modeling frameworks. Because dialog modeling using conventional POMDP requires very large size of state space, it has been hard to apply POMDP to the real domain of dialog system. In HIS dialog manager, system groups the belief states to reduce the size of state space, so that HIS dialog manager can be used in real world domain of dialog system. We adapted this HIS method to Smart-home domain multimodal dialog system.

  • PDF

POMDP based Dialogue Management System for Train Reservation Service (열차 예약을 위한 POMDP 기반의 대화 관리 시스템)

  • Sung, Joo Won;Eun, Jihyun;Kim, Hyunjeong;Chang, Du-Seong
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.167-171
    • /
    • 2008
  • 본 연구에서는 열차 예약 영역에 통계적 대화형 인터페이스를 도입하여 보다 자연스럽고 오류에 강인한 서비스 제공의 가능성을 검토하였다. 훈련용 코퍼스를 기반으로 사용자 및 시스템 행동 유형, 상태 변이 확률을 추출하여 정책을 도출하고, 성능분석용 코퍼스 기반 사용자 모델로 그 성능을 실험하였다. 방대한 시나리오의 반영을 위해 대량의 코퍼스 수집이 필요한 예제 기반 대화 정책, 혹은 인식기에 의한 오류나 노이즈를 고려하지 않음으로써 현실의 불확실성을 자연스럽게 반영하지 못하는 MDP 대화 정책에 비해 POMDP 정책은 효율적이고 빠른 훈련 알고리즘을 지속적으로 개선시켜 나간다면 적은 노력과 비용으로 효율적이고 강인한 대화 서비스의 제공이 가능할 것으로 기대된다.

  • PDF

Robust position estimation using POMDP

  • Kang, Daehee
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10a
    • /
    • pp.328-333
    • /
    • 1996
  • In this paper, we propose a new method to estimate robot position without landmark. At first, it is studied to estimate robot state using Markov decision rule. And, a matching method is discussed for estimating current position more accurately under the estimated current state. At second, we combine or fuse the matching method with the POMDP method in order to estimate the position under a dynamically changing environment. Finally we will show that our method can estimate the position precisely and robustly of which error are not cumulated through simulation results.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.