• Title/Summary/Keyword: 정책학습

Search Result 1,342, Processing Time 0.036 seconds

Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments (적대적 멀티 에이전트 환경에서 효율적인 강화 학습을 위한 정책 모델링)

  • Kwon, Ki-Duk;Kim, In-Cheol
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.3
    • /
    • pp.179-188
    • /
    • 2008
  • An important issue in multiagent reinforcement learning is how an agent should team its optimal policy through trial-and-error interactions in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for multiagent reinforcement teaming tend to apply single-agent reinforcement learning techniques without any extensions or are based upon some unrealistic assumptions even though they build and use explicit models of other agents. In this paper, basic concepts that constitute the common foundation of multiagent reinforcement learning techniques are first formulated, and then, based on these concepts, previous works are compared in terms of characteristics and limitations. After that, a policy model of the opponent agent and a new multiagent reinforcement learning method using this model are introduced. Unlike previous works, the proposed multiagent reinforcement learning method utilize a policy model instead of the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper. the Cat and Mouse game is introduced as an adversarial multiagent environment. And effectiveness of the proposed multiagent reinforcement learning method is analyzed through experiments using this game as testbed.

구조방정식을 이용한 사이버 가정학습 몰입 모형에 관한 연구

  • Baek, Hyeon-Gi;Ha, Tae-Hyeon
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.361-375
    • /
    • 2007
  • 본 논문은 국내 사이버가정학습에서의 학습자 몰입을 형성하는 조건과 몰입, 그리고 학습 성과에 대한 영향간의 관계에 대하여 연구하였다. 이러한 탐구를 위해 Csikszentmihalyi(1990)의 몰입이론(flow theory)을 이론적 기반으로 삼았으며, 컴퓨터 매개 환경(Computer-Mediated Environment)에서의 학습활동에 참여하고 있는 사이버가정학습 수강자 310명으로부터 자료를 얻어 실증적 연구를 수행하였다. 사이버가정학습 학습자의 몰입은 '즐거움', '원격현존감', '주의집중', '관여', '시간왜곡'의 5가지 하위구인으로 정의하였고, 몰입 선행 조건은 학습에 필요한 '기술'과 과제수행의 '도전' 정도에 대한 개인의 인지도의 차이로 정의되었다. 실증적 연구를 위해 사이버가정학습 몰입측정 도구(Cyber-class Flow Measure)를 활용하였으며, 실제 몰입도 측정 후 몰입도가 높은 학습자 집단과 낮은 학습자 집단의 특성을 비교 분석하였다. 자료 분석결과 사이버가정학습 몰입도는 학습만족도 평가에 유의미한 영향을 미침을 보여주었다.

  • PDF

Formal Model of Extended Reinforcement Learning (E-RL) System (확장된 강화학습 시스템의 정형모델)

  • Jeon, Do Yeong;Song, Myeong Ho;Kim, Soo Dong
    • Journal of Internet Computing and Services
    • /
    • v.22 no.4
    • /
    • pp.13-28
    • /
    • 2021
  • Reinforcement Learning (RL) is a machine learning algorithm that repeat the closed-loop process that agents perform actions specified by the policy, the action is evaluated with a reward function, and the policy gets updated accordingly. The key benefit of RL is the ability to optimze the policy with action evaluation. Hence, it can effectively be applied to developing advanced intelligent systems and autonomous systems. Conventional RL incoporates a single policy, a reward function, and relatively simple policy update, and hence its utilization was limited. In this paper, we propose an extended RL model that considers multiple instances of RL elements. We define a formal model of the key elements and their computing model of the extended RL. Then, we propose design methods for applying to system development. As a case stud of applying the proposed formal model and the design methods, we present the design and implementation of an advanced car navigator system that guides multiple cars to reaching their destinations efficiently.

Q-Learning Policy and Reward Design for Efficient Path Selection (효율적인 경로 선택을 위한 Q-Learning 정책 및 보상 설계)

  • Yong, Sung-Jung;Park, Hyo-Gyeong;You, Yeon-Hwi;Moon, Il-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.2
    • /
    • pp.72-77
    • /
    • 2022
  • Among the techniques of reinforcement learning, Q-Learning means learning optimal policies by learning Q functions that perform actionsin a given state and predict future efficient expectations. Q-Learning is widely used as a basic algorithm for reinforcement learning. In this paper, we studied the effectiveness of selecting and learning efficient paths by designing policies and rewards based on Q-Learning. In addition, the results of the existing algorithm and punishment compensation policy and the proposed punishment reinforcement policy were compared by applying the same number of times of learning to the 8x8 grid environment of the Frozen Lake game. Through this comparison, it was analyzed that the Q-Learning punishment reinforcement policy proposed in this paper can significantly increase the learning speed compared to the application of conventional algorithms.

A Study on the Help for the Victims of Crime -Focusing on the Police Community in Korea- (범죄피해자 보호정책의 결정요인에 관한 연구 -경찰조직을 중심으로-)

  • Ahn, Hwang-Kwon
    • Korean Security Journal
    • /
    • no.9
    • /
    • pp.261-288
    • /
    • 2005
  • This paper is concerned on the reality and the effected factors to the help for the victims of crime in the police. In our society the related enacted law and polices for the victim of crime have been practiced, but it was not easy to be helped or supported in some ways. On the contrary the victims have been used as tool of investigation or just the sources of witness on the crime. These days the issue of the victims of crime is one of the most concerned matters to the public, and it turns to on of core issues in the law enforcement community. According to the filed survey for the study, the female and higher level or longer careers than the male and lower level or shorter careers in the police are more positive on the help for the victims of crime. But if the members in the law enforcement community would be trained up with the organized program to fulfill their duties, they could be in positive on the help for the victimes of crime.

  • PDF

Building an Ontology for Structured Diagnosis Data Entry of Educating Underachieving Students (구조화된 학습부진아 진단 자료 입력을 위한 온톨로지 개발)

  • Ha, Tae-Hyeon;Baek, Hyeon-Gi
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.545-555
    • /
    • 2005
  • 본 연구는 학습 부진아 진단 지식을 온톨로지로 표현함으로써 교사와 학생 간에 발생하는 학습 용어의 불일치성을 해소할 수 있으며 진단 과정에 있어 학습 부진아의 정보를 기반으로 한 추론을 기능하도록 한다. 또한 특정한 진단을 보여주는 일반적인 학습부진아 진단시스템과는 달리, 이러한 지식베이스를 이용하여 사용자에게 정확한 개념어(정답어)를 습득하게끔 해주고, 사용자의 인지 체계 속에 내포되어 있는 개념적 지식을 더욱 더 표면적으로 확장해 나갈 수 있는 온톨로지를 구축하는 방안을 제시한다.

  • PDF

초점기획- 1. STEPI 국제심포지움 「신흥공업국의 기술혁신과 경쟁력」주제발표(요약)자료

  • Science & Technology Policy Insisute
    • Science & Technology Policy
    • /
    • v.7 no.7 s.100
    • /
    • pp.22-36
    • /
    • 1997
  • 목차 1. 학습경제에서의 발전 전략 2. 개발도상국의 산업화과정에서 연구개발과 기술이전의 역할 3. 전환기를 맞은 한국의 기술혁신시스템 4. 동아시아와 동남아시아의 기술혁신시스템 5. 수요기업의 진입과 한국 자본재산업의 발전 6. 계약, 기업능력 그리고 경제발전: 신흥공업국에 대한 함의 7. 인터넷 기반의 국가혁신시스템이 아시아에 주는 의미 8. 국제기술협력과 기업의 동력 :NIEs의 의미 9. 삼성반도체의 동태적 기술경영능력에 대한 사례연구 10. 수평적 기술정책에 대한 학습론적 접근: 진화론적 인식 11. 아시아 신흥공업경제국가들의 효과적인 기술혁신 12. 한국의 산업발전과정에서 과학기술정책의 역할

  • PDF

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.

The Comparison of the Impact of IQ and Social Intelligence on the Compliance with Administrative Regulatory Policies. (행정규제정책순응에 미치는 학습지능과 사회지능의 영향력 비교)

  • Ha, Ok-Hyun;Oh, Sae-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.247-256
    • /
    • 2009
  • The purpose of this study is to compare the impact of intellectual quotient and social intelligence on compliance with the administrative regulatory policies. This study found two things. The first one is the correlation between intellectual quotient and social intelligence is not so high. No matter how high is his or her intellectual quotient may be, it cannot be said that his or her social intelligence will be high in proportion to IQ. The second one is the influence of social intelligence on administrative regulatory policies is bigger than that of intellectual quotient. So to execute a policy efficiently, we cannot succeed without consideration to the factors of social intelligence. The result of analysis implies that policy authorities and the concerned citizens should try to get the social intelligence factors involved in all processes of administrative regulatory policies such as agenda setting, decision, implementation, evaluation and feedback.

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

  • 박찬건;양성봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.672-680
    • /
    • 2003
  • A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.