• Title/Summary/Keyword: Policy actor

Search Result 74, Processing Time 0.021 seconds

Actor-Critic Algorithm with Transition Cost Estimation

  • Sergey, Denisov;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.270-275
    • /
    • 2016
  • We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

Robot Locomotion via RLS-based Actor-Critic Learning (RLS 기반 Actor-Critic 학습을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.234-237
    • /
    • 2005
  • 강화학습을 위한 많은 방법 중 정책 반복을 이용한 actor-critic 학습 방법이 많은 적용 사례를 통해서 그 가능성을 인정받고 있다. Actor-critic 학습 방법은 제어입력 선택 전략을 위한 actor 학습과 가치 함수 근사를 위한 critic 학습이 필요하다. 본 논문은 critic의 학습을 위해 빠른 수렴성을 보장하는 RLS(recursive least square)를 사용하고, actor의 학습을 위해 정책의 기울기(policy gradient)를 이용하는 새로운 알고리즘을 제안하였다. 그리고 이를 실험적으로 확인하여 제안한 논문의 성능을 확인해 보았다.

  • PDF

Improved Deep Q-Network Algorithm Using Self-Imitation Learning (Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘)

  • Sunwoo, Yung-Min;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.644-649
    • /
    • 2021
  • Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Kernel-based actor-critic approach with applications

  • Chu, Baek-Suk;Jung, Keun-Woo;Park, Joo-Young
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.11 no.4
    • /
    • pp.267-274
    • /
    • 2011
  • Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.

Robot Locomotion via RLS-based Actor-Critic Learning (RLS 기반 Actor-Critic 학습을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.7
    • /
    • pp.893-898
    • /
    • 2005
  • Due to the merits that only a small amount of computation is needed for solutions and stochastic policies can be handled explicitly, the actor-critic algorithm, which is a class of reinforcement learning methods, has recently attracted a lot of interests in the area of artificial intelligence. The actor-critic network composes of tile actor network for selecting control inputs and the critic network for estimating value functions, and in its training stage, the actor and critic networks take the strategy, of changing their parameters adaptively in order to select excellent control inputs and yield accurate approximation for value functions as fast as possible. In this paper, we consider a new actor-critic algorithm employing an RLS(Recursive Least Square) method for critic learning, and policy gradients for actor learning. The applicability of the considered algorithm is illustrated with experiments on the two linked robot arm.

Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning (액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어)

  • Moon, Young-Joon;Lee, Jae-Hoon;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.519-524
    • /
    • 2009
  • Recently, reinforcement learning methods have drawn much interests in the area of machine learning. Dominant approaches in researches for the reinforcement learning include the value-function approach, the policy search approach, and the actor-critic approach, among which pertinent to this paper are algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy. In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory, together with the RLS-NAC which is based on the RLS filters and natural actor-critic methods. The presented method is applied to a control problem for crawling robots, and some results are reported from comparison of learning performance.

Investigating Science-Policy Interfaces in Japanese Politics through Climate Change Discourse Coalitions of an Environmental Policy Actor Network

  • Hartwig, Manuela G.
    • Journal of Contemporary Eastern Asia
    • /
    • v.18 no.2
    • /
    • pp.90-117
    • /
    • 2019
  • How is science advice integrated in environmental policymaking? This is an increasingly pertinent question that is being raised since the nuclear catastrophe of Fukushima, Japan, in 2011. Global re-evaluation of energy policies and climate mitigation measures include discussions on how to better integrate science advice in policymaking, and at the same time keeping science independent from political influence. This paper addressed the policy discourse of setting up a national CO2 reduction target in Japanese policymaking between 2009 and 2012. The target proposed by the former DPJ government was turned down, and Japan lacked a clear strategy for long-term climate mitigation. The analysis provides explanations from a quantitative actor-network perspective. Centrality measures from social network analysis for policy actors in an environmental policy network of Japan were calculated to identify those actors that control the discourse. Data used for analysis comes from the Global Environmental Policy Actor Network 2 (GEPON 2) survey conducted in Japan (2012-13). Science advice in Japan was kept independent from political influence and was mostly excluded from policymaking. One of the two largest discourse coalitions in the environmental policy network promoted a higher CO2 reduction target for international negotiations but favored lowering the target after a new international agreement would have been set. This may explain why Japan struggled to commit to long-term mitigation strategies. Applying social network analysis to quantitatively calculate discourse coalitions was a feasible methodology for investigating "discursive power." But limited in discussing the "practice" (e.g. meetings, telephone, or email conversations) among the actors in discourse coalitions.

Trading Strategy Using RLS-Based Natural Actor-Critic algorithm (RLS기반 Natural Actor-Critic 알고리즘을 이용한 트레이딩 전략)

  • Kang Daesung;Kim Jongho;Park Jooyoung;Park Kyung-Wook
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.238-241
    • /
    • 2005
  • 최근 컴퓨터를 이용하여 효과적인 트레이드를 하려는 투자자들이 늘고 있다. 본 논문에서는 많은 인공지능 방법론 중에서 강화학습(reinforcement learning)을 이용하여 효과적으로 트레이딩하는 방법에 대해서 다루려한다. 특히 강화학습 중에서 natural policy gradient를 이용하여 actor의 파라미터를 업데이트하고, value function을 효과적으로 추정하기 위해 RLS(recursive least-squares) 기법으로 critic 부분을 업데이트하는 RLS 기반 natural actor-critic 알고리즘을 이용하여 트레이딩을 수행하는 전략에 대한 가능성을 살펴 보기로 한다.

  • PDF

A Study on the Changes in and Characteristics of Informatization Policies in Korea: Focusing on the Actor-Network (한국 정보화정책의 변천과 특징 - 행위자 연결망을 중심으로 -)

  • Han, Saeeok
    • Informatization Policy
    • /
    • v.17 no.4
    • /
    • pp.23-43
    • /
    • 2010
  • Informatization in Korea has undergone significant changes. So far, most studies on informatization policies have been carried out just on the basis of their structural or functional backgrounds. However, actually, informatization policies have changed dynamically as a lot number of people and organizations have participated in their formulation and implementation. So, this study approaches them with an actor-network view that is distinct from but contains a chronological perspective, which other studies have overlooked so far. This approach allows us to have a clear picture of the changes in and to look into the characteristic of informatization policies from the Chun Doo Hwan government to the Lee Myung-bak government. Consequently, on the basis of the actor-network view, it is found that information and communication technologies, knowledge, and professionalism have dominated the characteristics and streams of informatization policies and brought about changes.

  • PDF