• 제목/요약/키워드: Learning state

검색결과 1,629건 처리시간 0.027초

미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습 (Online Reinforcement Learning to Search the Shortest Path in Maze Environments)

  • 김병천;김삼근;윤병주
    • 정보처리학회논문지B
    • /
    • 제9B권2호
    • /
    • pp.155-162
    • /
    • 2002
  • 강화 학습(reinforcement teaming)은 시행-착오(trial-and-er개r)를 통해 동적 환경과 상호작용하면서 학습을 수행하는 학습 방법으로, 실시간 강화 학습(online reinforcement learning)과 지연 강화 학습(delayed reinforcement teaming)으로 분류된다. 본 논문에서는 미로 환경에서 최단 경로를 빠르게 탐색할 수 있는 실시간 강화 학습 시스템(ONRELS : Outline REinforcement Learning System)을 제안한다. ONRELS는 현재 상태에서 상태전이를 하기 전에 선택 가능한 모든 (상태-행동) 쌍에 대한 평가 값을 갱신하고 나서 상태전이를 한다. ONRELS는 미로 환경의 상태 공간을 압축(compression)하고 나서 압축된 환경과 시행-착오를 통해 상호 작용하면서 학습을 수행한다. 실험을 통해 미로 환경에서 ONRELS는 TD -오류를 이용한 Q-학습과 $TD(\lambda{)}$를 이용한 $Q(\lambda{)}$-학습보다 최단 경로를 빠르게 탐색할 수 있음을 알 수 있었다.

칼만-버쉬 필터 이론 기반 미분 신경회로망 학습 (Learning of Differential Neural Networks Based on Kalman-Bucy Filter Theory)

  • 조현철;김관형
    • 제어로봇시스템학회논문지
    • /
    • 제17권8호
    • /
    • pp.777-782
    • /
    • 2011
  • Neural network technique is widely employed in the fields of signal processing, control systems, pattern recognition, etc. Learning of neural networks is an important procedure to accomplish dynamic system modeling. This paper presents a novel learning approach for differential neural network models based on the Kalman-Bucy filter theory. We construct an augmented state vector including original neural state and parameter vectors and derive a state estimation rule avoiding gradient function terms which involve to the conventional neural learning methods such as a back-propagation approach. We carry out numerical simulation to evaluate the proposed learning approach in nonlinear system modeling. By comparing to the well-known back-propagation approach and Kalman-Bucy filtering, its superiority is additionally proved under stochastic system environments.

임무수행을 위한 개선된 강화학습 방법 (An Improved Reinforcement Learning Technique for Mission Completion)

  • 권우영;이상훈;서일홍
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제52권9호
    • /
    • pp.533-539
    • /
    • 2003
  • Reinforcement learning (RL) has been widely used as a learning mechanism of an artificial life system. However, RL usually suffers from slow convergence to the optimum state-action sequence or a sequence of stimulus-response (SR) behaviors, and may not correctly work in non-Markov processes. In this paper, first, to cope with slow-convergence problem, if some state-action pairs are considered as disturbance for optimum sequence, then they no to be eliminated in long-term memory (LTM), where such disturbances are found by a shortest path-finding algorithm. This process is shown to let the system get an enhanced learning speed. Second, to partly solve a non-Markov problem, if a stimulus is frequently met in a searching-process, then the stimulus will be classified as a sequential percept for a non-Markov hidden state. And thus, a correct behavior for a non-Markov hidden state can be learned as in a Markov environment. To show the validity of our proposed learning technologies, several simulation result j will be illustrated.

Solving Continuous Action/State Problem in Q-Learning Using Extended Rule Based Fuzzy Inference System

  • Kim, Min-Soeng;Lee, Ju-Jang
    • Transactions on Control, Automation and Systems Engineering
    • /
    • 제3권3호
    • /
    • pp.170-175
    • /
    • 2001
  • Q-learning is a kind of reinforcement learning where the agent solves the given task based on rewards received from the environment. Most research done in the field of Q-learning has focused on discrete domains, although the environment with which the agent must interact is generally continuous. Thus we need to devise some methods that enable Q-learning to be applicable to the continuous problem domain. In this paper, an extended fuzzy rule is proposed so that it can incorporate Q-learning. The interpolation technique, which is widely used in memory-based learning, is adopted to represent the appropriate Q value for current state and action pair in each extended fuzzy rule. The resulting structure based on the fuzzy inference system has the capability of solving the continuous state about the environment. The effectiveness of the proposed structure is shown through simulation on the cart-pole system.

  • PDF

분할-결합 원리와 상태모형에 대한 학습이 모순문제 해결과 성장 마인드세트에 미치는 영향 (Learning Effects of Divide-and-Combine Principles and State Models on Contradiction Problem Solving and Growth Mindset)

  • 현정석;박찬정
    • 지식경영연구
    • /
    • 제14권4호
    • /
    • pp.19-46
    • /
    • 2013
  • This paper aims to show the learning process and the educational effects of Divide-and-Combine principles and State Models, which are included in the Butterfly Model for creative problem solving. In our State Models, there are Time State Model, Space State Model, and Whole-Parts State Model. We have taught middle school students (for 18 hours), high school students (for 24 hours), and undergraduate students (for 1 semester) about our proposed Models when they solved contradiction problems. Also, we have made the students learn our contradiction resolution algorithms by themselves based on team-based discussion. By learning and by using our Models, the students had the higher level of expertise in contradiction problems and had the growth mindset that made them have confidence in themselves and kept them challenging themselves about problems. Also, learning and solving with our Models improved the students' growth mindset as well as their problem-solving ability.

  • PDF

함수근사와 규칙추출을 위한 클러스터링을 이용한 강화학습 (Reinforcement Learning with Clustering for Function Approximation and Rule Extraction)

  • 이영아;홍석미;정태충
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제30권11호
    • /
    • pp.1054-1061
    • /
    • 2003
  • 강화학습의 대표적인 알고리즘인 Q-Learning은 상태공간의 모든 상태-행동 쌍(state-action pairs)의 평가값이 수렴할 때까지 반복해서 경험하여 최적의 전략(policy)을 얻는다. 상태공간을 구성하는 요소(feature)들이 많거나 요소의 데이타 형태가 연속형(continuous)인 경우, 상태공간은 지수적으로 증가하게 되어, 모든 상태들을 반복해서 경험해야 하고 모든 상태-행동 쌍의 Q값을 저장하는 것은 시간과 메모리에 있어서 어려운 문제이다. 본 논문에서는 온라인으로 학습을 진행하면서 비슷한 상황의 상태들을 클러스터링(clustering)하고 새로운 경험에 적응해서 클러스터(cluster)의 수정(update)을 반복하여, 분류된 최적의 전략(policy)을 얻는 새로운 함수근사(function approximation)방법인 Q-Map을 소개한다. 클러스터링으로 인해 정교한 제어가 필요한 상태(state)는 규칙(rule)으로 추출하여 보완하였다. 미로환경과 마운틴 카 문제를 제안한 Q-Map으로 실험한 결과 분류된 지식을 얻을 수 있었으며 가시화된(explicit) 지식의 형태인 규칙(rule)으로도 쉽게 변환할 수 있었다.

Multiple Behavior s Learning and Prediction in Unknown Environment

  • Song, Wei;Cho, Kyung-Eun;Um, Ky-Hyun
    • 한국멀티미디어학회논문지
    • /
    • 제13권12호
    • /
    • pp.1820-1831
    • /
    • 2010
  • When interacting with unknown environments, an autonomous agent needs to decide which action or action order can result in a good state and determine the transition probability based on the current state and the action taken. The traditional multiple sequential learning model requires predefined probability of the states' transition. This paper proposes a multiple sequential learning and prediction system with definition of autonomous states to enhance the automatic performance of existing AI algorithms. In sequence learning process, the sensed states are classified into several group by a set of proposed motivation filters to reduce the learning computation. In prediction process, the learning agent makes a decision based on the estimation of each state's cost to get a high payoff from the given environment. The proposed learning and prediction algorithms heightens the automatic planning of the autonomous agent for interacting with the dynamic unknown environment. This model was tested in a virtual library.

Telecommunication Technologies As The Basis Of Distance Education

  • Нritchenko, Tetiana;Dekarchuk, Serhii;Byedakova, Sofiia;Shkrobot, Svitlana;Denysiuk, Nataliia
    • International Journal of Computer Science & Network Security
    • /
    • 제21권11호
    • /
    • pp.248-256
    • /
    • 2021
  • The article discusses the evolution of the development of distance learning in world practice; investigated the essence and modern content of the concepts of "distance learning" and "distance education"; studied the principles of distance learning in the educational process; analyze the use of distance learning in higher educational institutions of Ukraine; substantiated the effectiveness of introducing distance learning into the higher education system; formed new management approaches in the distance learning system; proposals for the organization and improvement of distance learning at the university were developed on the basis of the analysis.

Reinforcement Leaming Using a State Partition Method under Real Environment

  • Saito, Ken;Masuda, Shiro;Yamaguchi, Toru
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 ISIS 2003
    • /
    • pp.66-69
    • /
    • 2003
  • This paper considers a reinforcement learning(RL) which deals with real environments. Most reinforcement learning studies have been made by simulations because real-environment learning requires large computational cost and much time. Furthermore, it is more difficult to acquire many rewards efficiently in real environments than in virtual ones. The most important requirement to make real-environment learning successful is the appropriate construction of the state space. In this paper, to begin with, I show the basic overview of the reinforcement learning under real environments. Next, 1 introduce a state-space construction method under real environmental which is State Partition Method. Finally I apply this method to a robot navigation problem and compare it with conventional methods.

  • PDF

가중 기여도를 이용한 퍼지 Q-learning (Fuzzy Q-learning using Weighted Eligibility)

  • 정석일;이연정
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2000년도 추계학술대회 학술발표 논문집
    • /
    • pp.163-167
    • /
    • 2000
  • The eligibility is used to solve the credit-assignment problem which is one of important problems in reinforcement learning. Conventional eligibilities which are accumulating eligibility and replacing eligibility make ineffective use of rewards acquired in learning process. Because only an executed action in a visited state is learned by these eligibilities. Thus, we propose a new eligibility, called the weighted eligibility with which not only an executed action but also neighboring actions in a visited state are to be learned. The fuzzy Q-learning algorithm using proposed eligibility is applied to a cart-pole balancing problem, which shows improvement of learning speed.

  • PDF