통합 검색 | Korea Science

강화학습의 Q-learning을 위한 함수근사 방법 (A Function Approximation Method for Q-learning of Reinforcement Learning)

이영아;정태충
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제31권11호
- /
- pp.1431-1438
- /
- 2004
강화학습(reinforcement learning)은 온라인으로 환경(environment)과 상호작용 하는 과정을 통하여 목표를 이루기 위한 전략을 학습한다. 강화학습의 기본적인 알고리즘인 Q-learning의 학습 속도를 가속하기 위해서, 거대한 상태공간 문제(curse of dimensionality)를 해결할 수 있고 강화학습의 특성에 적합한 함수 근사 방법이 필요하다. 본 논문에서는 이러한 문제점들을 개선하기 위해서, 온라인 퍼지 클러스터링(online fuzzy clustering)을 기반으로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 온라인 학습이 가능하고 환경의 불확실성을 표현할 수 있는 강화학습에 적합한 함수근사방법이다. Fuzzy Q-Map을 마운틴 카 문제에 적용하여 보았고, 학습 초기에 학습 속도가 가속됨을 보였다.
PDF KSCI

강화학습의 학습 가속을 위한 함수 근사 방법 (Function Approximation for accelerating learning speed in Reinforcement Learning)

이영아;정태충
- 한국지능시스템학회논문지
- /
- 제13권6호
- /
- pp.635-642
- /
- 2003
강화학습은 제어, 스케쥴링 등 많은 응용분야에서 성공적인 학습 결과를 얻었다. 기본적인 강화학습 알고리즘인 Q-Learning, TD(λ), SARSA 등의 학습 속도의 개선과 기억장소 등의 문제를 해결하기 위해서 여러 함수 근사방법(function approximation methods)이 연구되었다. 대부분의 함수 근사 방법들은 가정을 통하여 강화학습의 일부 특성을 제거하고 사전지식과 사전처리가 필요하다. 예로 Fuzzy Q-Learning은 퍼지 변수를 정의하기 위한 사전 처리가 필요하고, 국소 최소 자승법은 훈련 예제집합을 이용한다. 본 논문에서는 온-라인 퍼지 클러스터링을 이용한 함수 근사 방법인 Fuzzy Q-Map을 제안하다. Fuzzy Q-Map은 사전 지식이 최소한으로 주어진 환경에서, 온라인으로 주어지는 상태를 거리에 따른 소속도(membership degree)를 이용하여 분류하고 행동을 예측한다. Fuzzy Q-Map과 다른 함수 근사 방법인 CMAC와 LWR을 마운틴 카 문제에 적용하여 실험 한 결과 Fuzzy Q-Map은 훈련예제를 사용하지 않는 CMAC보다는 빠르게 최고 예측율에 도달하였고, 훈련 예제를 사용한 LWR보다는 낮은 예측율을 보였다.
https://doi.org/10.5391/JKIIS.2003.13.6.635 인용 PDF KSCI

Solving Continuous Action/State Problem in Q-Learning Using Extended Rule Based Fuzzy Inference System

Kim, Min-Soeng;Lee, Ju-Jang
- Transactions on Control, Automation and Systems Engineering
- /
- 제3권3호
- /
- pp.170-175
- /
- 2001
Q-learning is a kind of reinforcement learning where the agent solves the given task based on rewards received from the environment. Most research done in the field of Q-learning has focused on discrete domains, although the environment with which the agent must interact is generally continuous. Thus we need to devise some methods that enable Q-learning to be applicable to the continuous problem domain. In this paper, an extended fuzzy rule is proposed so that it can incorporate Q-learning. The interpolation technique, which is widely used in memory-based learning, is adopted to represent the appropriate Q value for current state and action pair in each extended fuzzy rule. The resulting structure based on the fuzzy inference system has the capability of solving the continuous state about the environment. The effectiveness of the proposed structure is shown through simulation on the cart-pole system.
PDF

퍼지 클러스터링을 이용한 강화학습의 함수근사 (Function Approximation for Reinforcement Learning using Fuzzy Clustering)

이영아;정경숙;정태충
- 정보처리학회논문지B
- /
- 제10B권6호
- /
- pp.587-592
- /
- 2003
강화학습을 적용하기에 적합한 많은 실세계의 제어 문제들은 연속적인 상태 또는 행동(continuous states or actions)을 갖는다. 연속 값을 갖는 문제인 경우, 상태공간의 크기가 거대해져서 모든 상태-행동 쌍을 학습하는데 메모리와 시간상의 문제가 있다. 이를 해결하기 위하여 학습된 유사한 상태로부터 새로운 상태에 대한 추측을 하는 함수 근사 방법이 필요하다. 본 논문에서는 1-step Q-learning의 함수 근사를 위하여 퍼지 클러스터링을 기초로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 데이터에 대한 각 클러스터의 소속도(membership degree)를 이용하여 유사한 상태들을 군집하고 행동을 선택하고 Q값을 참조했다. 또한 승자(winner)가 되는 퍼지 클러스터의 중심과 Q값은 소속도와 TD(Temporal Difference) 에러를 이용하여 갱신하였다. 본 논문에서 제안한 방법은 마운틴 카 문제에 적용한 결과, 빠른 수렴 결과를 보였다.
https://doi.org/10.3745/KIPSTB.2003.10B.6.587 인용 PDF KSCI

지능형 에이전트의 모호한 목적을 처리하기 위한 FuzzyQ-Learning (FuzzyQ-Learning to Process the Vague Goals of Intelligent Agent)

서호섭;윤소정;오경환
- 한국정보과학회:학술대회논문집
- /
- 한국정보과학회 2000년도 봄 학술발표논문집 Vol.27 No.1 (B)
- /
- pp.271-273
- /
- 2000
일반적으로, 지능형 에이전트는 사용자의 목적과 주위 환경으로부터 최적의 행동을 스스로 찾아낼 수 있어야 한다. 만약 에이전트의 목적이나 주위 환경이 불확실성을 포함하는 경우, 에이전트는 적절한 행동을 선택하기 어렵다. 그러나, 사용자의 목적이 인간 지식의 불확실성을 포함하는 언어값으로 표현되었을 경우, 이를 처리하려는 연구는 없었다. 본 논문에서는 모호한 사용자의 의도를 퍼지 목적으로 나타내고, 에이전트가 인지하는 불확실한 환경을 퍼지 상태로 표현하는 방법을 제안한다. 또, 퍼지 목적과 상태를 이용하여 확장한 펴지 강화 함수와를 이용하여, 기존 강화 학습 알고리즘 중 하나인 Q-Learning을 FuzzyQ-Learning으로 확장하고, 이에 대한 타당성을 검증한다.
PDF

가중 기여도를 이용한 퍼지 Q-learning (Fuzzy Q-learning using Weighted Eligibility)

정석일;이연정
- 한국지능시스템학회:학술대회논문집
- /
- 한국퍼지및지능시스템학회 2000년도 추계학술대회 학술발표 논문집
- /
- pp.163-167
- /
- 2000
The eligibility is used to solve the credit-assignment problem which is one of important problems in reinforcement learning. Conventional eligibilities which are accumulating eligibility and replacing eligibility make ineffective use of rewards acquired in learning process. Because only an executed action in a visited state is learned by these eligibilities. Thus, we propose a new eligibility, called the weighted eligibility with which not only an executed action but also neighboring actions in a visited state are to be learned. The fuzzy Q-learning algorithm using proposed eligibility is applied to a cart-pole balancing problem, which shows improvement of learning speed.
PDF

분포 기여도를 이용한 퍼지 Q-learning (Fuzzy Q-learning using Distributed Eligibility)

정석일;이연정
- 한국지능시스템학회논문지
- /
- 제11권5호
- /
- pp.388-394
- /
- 2001
강화학습은 에이전트가 환경과의 상호작용을 통해 획득한 경험으로부터 제어 규칙을 학습하는 방법이다. 강화학습의 중요한 문제 중의 하나인 신뢰 할당 문제를 해결하기 위해 기여도가 사용되는데, 누적 기여도나 대체 기여도와 같은 기존의 기여도를 이용한 방법은 방문한 상태에서 수행된 행위만을 학습시키기 때문에 학습 자정에서 획득된 보답 신호를 효과적으로 사용하지 못한다. 본 논문에서는 방문한 상태에서 수행된 행위뿐만 아니라 인접 행위들도 학습될 수 있도록 하는 새로운 기여도로써 분포 기여도를 제안한다. 제안된 기여도를 이용한 퍼지 Q-learning 알고리즘을 역진자 시스템에 적용하여 학습 속도면에서 기존의 방법에 비해 우수함을 보인다.
PDF

Multiple Reward Reinforcement learning control of a mobile robot in home network environment

Kang, Dong-Oh;Lee, Jeun-Woo
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2003년도 ICCAS
- /
- pp.1300-1304
- /
- 2003
The following paper deals with a control problem of a mobile robot in home network environment. The home network causes the mobile robot to communicate with sensors to get the sensor measurements and to be adapted to the environment changes. To get the improved performance of control of a mobile robot in spite of the change in home network environment, we use the fuzzy inference system with multiple reward reinforcement learning. The multiple reward reinforcement learning enables the mobile robot to consider the multiple control objectives and adapt itself to the change in home network environment. Multiple reward fuzzy Q-learning method is proposed for the multiple reward reinforcement learning. Multiple Q-values are considered and max-min optimization is applied to get the improved fuzzy rule. To show the effectiveness of the proposed method, some simulation results are given, which are performed in home network environment, i.e., LAN, wireless LAN, etc.
PDF

Object tracking algorithm of Swarm Robot System for using Polygon based Q-learning and parallel SVM

Seo, Snag-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제8권3호
- /
- pp.220-224
- /
- 2008
This paper presents the polygon-based Q-leaning and Parallel SVM algorithm for object search with multiple robots. We organized an experimental environment with one hundred mobile robots, two hundred obstacles, and ten objects. Then we sent the robots to a hallway, where some obstacles were lying about, to search for a hidden object. In experiment, we used four different control methods: a random search, a fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process to determine the next action of the robots, and hexagon-based Q-learning, and dodecagon-based Q-learning and parallel SVM algorithm to enhance the fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process. In this paper, the result show that dodecagon-based Q-learning and parallel SVM algorithm is better than the other algorithm to tracking for object.
https://doi.org/10.5391/IJFIS.2008.8.3.220 인용 PDF KSCI

Fuzzy Logic Controlled Neural Network Learning

Hertz, D.B.;Hu, Q.
- 한국지능시스템학회:학술대회논문집
- /
- 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
- /
- pp.1358-1361
- /
- 1993
PDF

검색결과 20건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)