• 제목/요약/키워드: Fuzzy Reinforcement

검색결과 79건 처리시간 0.028초

강화학습의 학습 가속을 위한 함수 근사 방법 (Function Approximation for accelerating learning speed in Reinforcement Learning)

  • 이영아;정태충
    • 한국지능시스템학회논문지
    • /
    • 제13권6호
    • /
    • pp.635-642
    • /
    • 2003
  • 강화학습은 제어, 스케쥴링 등 많은 응용분야에서 성공적인 학습 결과를 얻었다. 기본적인 강화학습 알고리즘인 Q-Learning, TD(λ), SARSA 등의 학습 속도의 개선과 기억장소 등의 문제를 해결하기 위해서 여러 함수 근사방법(function approximation methods)이 연구되었다. 대부분의 함수 근사 방법들은 가정을 통하여 강화학습의 일부 특성을 제거하고 사전지식과 사전처리가 필요하다. 예로 Fuzzy Q-Learning은 퍼지 변수를 정의하기 위한 사전 처리가 필요하고, 국소 최소 자승법은 훈련 예제집합을 이용한다. 본 논문에서는 온-라인 퍼지 클러스터링을 이용한 함수 근사 방법인 Fuzzy Q-Map을 제안하다. Fuzzy Q-Map은 사전 지식이 최소한으로 주어진 환경에서, 온라인으로 주어지는 상태를 거리에 따른 소속도(membership degree)를 이용하여 분류하고 행동을 예측한다. Fuzzy Q-Map과 다른 함수 근사 방법인 CMAC와 LWR을 마운틴 카 문제에 적용하여 실험 한 결과 Fuzzy Q-Map은 훈련예제를 사용하지 않는 CMAC보다는 빠르게 최고 예측율에 도달하였고, 훈련 예제를 사용한 LWR보다는 낮은 예측율을 보였다.

Multiple Reward Reinforcement learning control of a mobile robot in home network environment

  • Kang, Dong-Oh;Lee, Jeun-Woo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.1300-1304
    • /
    • 2003
  • The following paper deals with a control problem of a mobile robot in home network environment. The home network causes the mobile robot to communicate with sensors to get the sensor measurements and to be adapted to the environment changes. To get the improved performance of control of a mobile robot in spite of the change in home network environment, we use the fuzzy inference system with multiple reward reinforcement learning. The multiple reward reinforcement learning enables the mobile robot to consider the multiple control objectives and adapt itself to the change in home network environment. Multiple reward fuzzy Q-learning method is proposed for the multiple reward reinforcement learning. Multiple Q-values are considered and max-min optimization is applied to get the improved fuzzy rule. To show the effectiveness of the proposed method, some simulation results are given, which are performed in home network environment, i.e., LAN, wireless LAN, etc.

  • PDF

퍼지근사추론에 의한 폐터널의 보강방식 선정 (Determination of Reinforcement Method for Abandoned Tunnel by Fuzzy Approximate Reasoning)

  • 조만섭
    • 터널과지하공간
    • /
    • 제14권4호
    • /
    • pp.275-286
    • /
    • 2004
  • 본 논문에서는 신규 터널노선과 교차하는 폐터널의 보강방식을 결정하기 위하여 의사결정기법을 검토하였고, 여러 가지 의사결정기법들 중에서 설문조사의 과정을 최소화 하고, 조사항목 별 정성적ㆍ정량적 특성을 모두 반영할 수 있도록 쌍대비교와 퍼지근사추론을 이용하여 폐터널의 보강방식에 대한 적정성을 평가하여 보았다. 페터널 보강방식을 선정하기 위하여 4개의 주 요인들 즉, 시공성, 경제성, 안전성, 유지관리성을 평가의 수단으로 사용하였고, 간단한 설문조사와 쌍대비교행렬을 이용하여 4가지 주 요인들의 가중치를 결정하였다. 퍼지근사추론은 4개의 주 요인별 평가점수를 산정 하는데 사용되어졌고, 이 결과들에 가중치를 반영하여 최종적인 폐터널의 보강방식을 선정할 수 있었다.

Fuzzy Classifier System for Edge Detection

  • Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제3권1호
    • /
    • pp.52-57
    • /
    • 2003
  • In this paper, we propose a Fuzzy Classifier System(FCS) to find a set of fuzzy rules which can carry out the edge detection. The classifier system of Holland can evaluate the usefulness of rules represented by classifiers with repeated learning. FCS makes the classifier system be able to carry out the mapping from continuous inputs to outputs. It is the FCS that applies the method of machine learning to the concept of fuzzy logic. It is that the antecedent and consequent of classifier is same as a fuzzy rule. In this paper, the FCS is the Michigan style. A single fuzzy if-then rule is coded as an individual. The average gray levels which each group of neighbor pixels has are represented into fuzzy set. Then a pixel is decided whether it is edge pixel or not using fuzzy if-then rules. Depending on the average of gray levels, a number of fuzzy rules can be activated, and each rules makes the output. These outputs are aggregated and defuzzified to take new gray value of the pixel. To evaluate this edge detection, we will compare the new gray level of a pixel with gray level obtained by the other edge detection method such as Sobel edge detection. This comparison provides a reinforcement signal for FCS which is reinforcement learning. Also the FCS employs the Genetic Algorithms to make new rules and modify rules when performance of the system needs to be improved.

퍼지 클러스터링을 이용한 강화학습의 함수근사 (Function Approximation for Reinforcement Learning using Fuzzy Clustering)

  • 이영아;정경숙;정태충
    • 정보처리학회논문지B
    • /
    • 제10B권6호
    • /
    • pp.587-592
    • /
    • 2003
  • 강화학습을 적용하기에 적합한 많은 실세계의 제어 문제들은 연속적인 상태 또는 행동(continuous states or actions)을 갖는다. 연속 값을 갖는 문제인 경우, 상태공간의 크기가 거대해져서 모든 상태-행동 쌍을 학습하는데 메모리와 시간상의 문제가 있다. 이를 해결하기 위하여 학습된 유사한 상태로부터 새로운 상태에 대한 추측을 하는 함수 근사 방법이 필요하다. 본 논문에서는 1-step Q-learning의 함수 근사를 위하여 퍼지 클러스터링을 기초로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 데이터에 대한 각 클러스터의 소속도(membership degree)를 이용하여 유사한 상태들을 군집하고 행동을 선택하고 Q값을 참조했다. 또한 승자(winner)가 되는 퍼지 클러스터의 중심과 Q값은 소속도와 TD(Temporal Difference) 에러를 이용하여 갱신하였다. 본 논문에서 제안한 방법은 마운틴 카 문제에 적용한 결과, 빠른 수렴 결과를 보였다.

Solving Continuous Action/State Problem in Q-Learning Using Extended Rule Based Fuzzy Inference System

  • Kim, Min-Soeng;Lee, Ju-Jang
    • Transactions on Control, Automation and Systems Engineering
    • /
    • 제3권3호
    • /
    • pp.170-175
    • /
    • 2001
  • Q-learning is a kind of reinforcement learning where the agent solves the given task based on rewards received from the environment. Most research done in the field of Q-learning has focused on discrete domains, although the environment with which the agent must interact is generally continuous. Thus we need to devise some methods that enable Q-learning to be applicable to the continuous problem domain. In this paper, an extended fuzzy rule is proposed so that it can incorporate Q-learning. The interpolation technique, which is widely used in memory-based learning, is adopted to represent the appropriate Q value for current state and action pair in each extended fuzzy rule. The resulting structure based on the fuzzy inference system has the capability of solving the continuous state about the environment. The effectiveness of the proposed structure is shown through simulation on the cart-pole system.

  • PDF

액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어 (Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning)

  • 문영준;이재훈;박주영
    • 한국지능시스템학회논문지
    • /
    • 제19권4호
    • /
    • pp.519-524
    • /
    • 2009
  • 최근에 강화학습 기법은 기계학습 분야에서 많은 관심을 끌어왔다. 강화학습 관련 연구에서 가장 유력하게 사용되어 온 방법들로는 가치함수를 활용하는 기법, 제어규칙(policy) 탐색 기법 및 액터-크리틱 기법 등이 있는데, 본 논문에서는 이들 중 연속 상태 및 연속 입력을 갖는 문제를 위하여 액터-크리틱 기법의 틀에서 제안된 알고리즘들과 관련된 내용을 다룬다. 특히 본 논문은 퍼지 이론에 기반을 둔 액터-크리틱 계열 강화학습 기법인 ACFRL 알고리즘과, RLS 필터와 NAC(natural actor-critic) 기법에 기반을 둔 RLS-NAC 기법을 접목하는 방안을 집중적으로 고찰한다. 고찰된 방법론은 기는 로봇의 제어문제에 적용되고, 학습 성능의 비교로부터 얻어진 몇 가지 결과가 보고된다.

Neuro-fuzzy optimisation to model the phenomenon of failure by punching of a slab-column connection without shear reinforcement

  • Hafidi, Mariam;Kharchi, Fattoum;Lefkir, Abdelouhab
    • Structural Engineering and Mechanics
    • /
    • 제47권5호
    • /
    • pp.679-700
    • /
    • 2013
  • Two new predictive design methods are presented in this study. The first is a hybrid method, called neuro-fuzzy, based on neural networks with fuzzy learning. A total of 280 experimental datasets obtained from the literature concerning concentric punching shear tests of reinforced concrete slab-column connections without shear reinforcement were used to test the model (194 for experimentation and 86 for validation) and were endorsed by statistical validation criteria. The punching shear strength predicted by the neuro-fuzzy model was compared with those predicted by current models of punching shear, widely used in the design practice, such as ACI 318-08, SIA262 and CBA93. The neuro-fuzzy model showed high predictive accuracy of resistance to punching according to all of the relevant codes. A second, more user-friendly design method is presented based on a predictive linear regression model that supports all the geometric and material parameters involved in predicting punching shear. Despite its simplicity, this formulation showed accuracy equivalent to that of the neuro-fuzzy model.

On the Implementation of Fuzzy Arithmetic for Prediction Model Equation of Corrosion Initiation

  • Do Jeong-Yun;Song Hun;Soh Yang-Seob
    • 콘크리트학회논문집
    • /
    • 제17권6호
    • /
    • pp.1045-1051
    • /
    • 2005
  • For critical structures and application, where a given reliability must be met, it is necessary to account for uncertainties and variability in material properties, structural parameters affecting the corrosion process, in addition to the statistical and decision uncertainties. This paper presents an approach to the fuzzy arithmetic based modeling of the chloride-induced corrosion of reinforcement in concrete structures that takes into account the uncertainties in the physical models of chloride penetration into concrete and corrosion of steel reinforcement, as well as the uncertainties in the governing parameters, including concrete diffusivity, concrete cover depth, surface chloride concentration and critical chloride level for corrosion initiation. The parameters of the models are regarded as fuzzy numbers with proper membership function adapted to statistical data of the governing parameters and the fuzziness of the corrosion time is determined by the fuzzy arithmetic of interval arithmetic and extension principle

Fuzzy Inference-based Reinforcement Learning of Dynamic Recurrent Neural Networks

  • Jun, Hyo-Byung;Sim, Kwee-Bo
    • 한국지능시스템학회논문지
    • /
    • 제7권5호
    • /
    • pp.60-66
    • /
    • 1997
  • This paper presents a fuzzy inference-based reinforcement learning algorithm of dynamci recurrent neural networks, which is very similar to the psychological learning method of higher animals. By useing the fuzzy inference technique the linguistic and concetional expressions have an effect on the controller's action indirectly, which is shown in human's behavior. The intervlas of fuzzy membership functions are found optimally by genetic algorithms. And using recurrent neural networks composed of dynamic neurons as action-generation networks, past state as well as current state is considered to make an action in dynamical environment. We show the validity of the proposed learning algorithm by applying it to the inverted pendulum control problem.

  • PDF