• Title/Summary/Keyword: Q Learning

Search Result 430, Processing Time 0.027 seconds

Flipped Learning teaching model design and application for the University's "Linear Algebra" ('선형대수학' 플립드러닝(Flipped Learning) 강의 모델 설계 및 적용)

  • Park, Kyung-Eun;Lee, Sang-Gu
    • Communications of Mathematical Education
    • /
    • v.30 no.1
    • /
    • pp.1-22
    • /
    • 2016
  • We had a full scale of literature survey and case survey of mathematics Flipped Learning class models. The purpose of this study is to design and adopt a Flipped Learning 'Linear Algebra' class model that fis our need. We applied our new model to 30 students at S University. Then we analyzed the activities and performance of students in this course. Our Flipped Learning 'Linear Algebra' teaching model is followed in 3 stages : The first stage involved the students viewing an online lecture as homework and participating free question-answer by themselves on Q&A before class, the second stage involved in-class learning which researcher solved the students' Q&A and highlighted the main ideas through the Point-Lecture, the third stage involved the students participating more advanced topic by themselves on Q&A and researcher (or peers) finalizing students' Q&A. According to the survey, the teaching model made a certain contribution not only to increase students' participation and interest, but also to improve their communication skill and self-directed learning skill in all classes and online. We used the Purposive Sampling from the obtained data. For the research's validity and reliability, we used the Content Validity and the Alternate-Form Method. We found several meaningful output from this analysis.

Barycentric Approximator for Reinforcement Learning Control

  • Whang Cho
    • International Journal of Precision Engineering and Manufacturing
    • /
    • v.3 no.1
    • /
    • pp.33-42
    • /
    • 2002
  • Recently, various experiments to apply reinforcement learning method to the self-learning intelligent control of continuous dynamic system have been reported in the machine learning related research community. The reports have produced mixed results of some successes and some failures, and show that the success of reinforcement learning method in application to the intelligent control of continuous control systems depends on the ability to combine proper function approximation method with temporal difference methods such as Q-learning and value iteration. One of the difficulties in using function approximation method in connection with temporal difference method is the absence of guarantee for the convergence of the algorithm. This paper provides a proof of convergence of a particular function approximation method based on \"barycentric interpolator\" which is known to be computationally more efficient than multilinear interpolation .

Reinforcement Learning based Dynamic Positioning of Robot Soccer Agents (강화학습에 기초한 로봇 축구 에이전트의 동적 위치 결정)

  • 권기덕;김인철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.55-57
    • /
    • 2001
  • 강화학습은 한 에이전트가 자신이 놓여진 환경으로부터의 보상을 최대화할 수 있는 최적의 행동 전략을 학습하는 것이다. 따라서 강화학습은 입력(상태)과 출력(행동)의 쌍으로 명확한 훈련 예들이 제공되는 교사 학습과는 다르다. 특히 Q-학습과 같은 비 모델 기반(model-free)의 강화학습은 사전에 환경에 대한 별다른 모델을 설정하거나 학습할 필요가 없으며 다양한 상태와 행동들을 충분히 자주 경험할 수만 있으면 최적의 행동전략에 도달할 수 있어 다양한 응용분야에 적용되고 있다. 하지만 실제 응용분야에서 Q-학습과 같은 강화학습이 겪는 최대의 문제는 큰 상태 공간을 갖는 문제의 경우에는 적절한 시간 내에 각 상태와 행동들에 대한 최적의 Q값에 수렴할 수 없어 효과를 거두기 어렵다는 점이다. 이런 문제점을 고려하여 본 논문에서는 로봇 축구 시뮬레이션 환경에서 각 선수 에이전트의 동적 위치 결정을 위해 효과적인 새로운 Q-학습 방법을 제안한다. 이 방법은 원래 문제의 상태공간을 몇 개의 작은 모듈들로 나누고 이들의 개별적인 Q-학습 결과를 단순히 결합하는 종래의 모듈화 Q-학습(Modular Q-Learning)을 개선하여, 보상에 끼친 각 모듈의 기여도에 따라 모듈들의 학습결과를 적응적으로 결합하는 방법이다. 이와 같은 적응적 중재에 기초한 모듈화 Q-학습법(Adaptive Mediation based Modular Q-Learning, AMMQL)은 종래의 모듈화 Q-학습법의 장점과 마찬가지로 큰 상태공간의 문제를 해결할 수 있을 뿐 아니라 보다 동적인 환경변화에 유연하게 적응하여 새로운 행동 전략을 학습할 수 있다는 장점을 추가로 가질 수 있다. 이러한 특성을 지닌 AMMQL 학습법은 로봇축구와 같이 끊임없이 실시간적으로 변화가 일어나는 다중 에이전트 환경에서 특히 높은 효과를 볼 수 있다. 본 논문에서는 AMMQL 학습방법의 개념을 소개하고, 로봇축구 에이전트의 동적 위치 결정을 위한 학습에 어떻게 이 학습방법을 적용할 수 있는지 세부 설계를 제시한다.

  • PDF

Path selection algorithm for multi-path system based on deep Q learning (Deep Q 학습 기반의 다중경로 시스템 경로 선택 알고리즘)

  • Chung, Byung Chang;Park, Heasook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.50-55
    • /
    • 2021
  • Multi-path system is a system in which utilizes various networks simultaneously. It is expected that multi-path system can enhance communication speed, reliability, security of network. In this paper, we focus on path selection in multi-path system. To select optimal path, we propose deep reinforcement learning algorithm which is rewarded by the round-trip-time (RTT) of each networks. Unlike multi-armed bandit model, deep Q learning is applied to consider rapidly changing situations. Due to the delay of RTT data, we also suggest compensation algorithm of the delayed reward. Moreover, we implement testbed learning server to evaluate the performance of proposed algorithm. The learning server contains distributed database and tensorflow module to efficiently operate deep learning algorithm. By means of simulation, we showed that the proposed algorithm has better performance than lowest RTT about 20%.

Q-NAV: NAV Setting Method based on Reinforcement Learning in Underwater Wireless Networks (Q-NAV: 수중 무선 네트워크에서 강화학습 기반의 NAV 설정 방법)

  • Park, Seok-Hyeon;Jo, Ohyun
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.6
    • /
    • pp.1-7
    • /
    • 2020
  • The demand on the underwater communications is extremely increasing in searching for underwater resources, marine expedition, or environmental researches, yet there are many problems with the wireless communications because of the characteristics of the underwater environments. Especially, with the underwater wireless networks, there happen inevitable delay time and spacial inequality due to the distances between the nodes. To solve these problems, this paper suggests a new solution based on ALOHA-Q. The suggested method use random NAV value. and Environments take reward through communications success or fail. After then, The environments setting NAV value from reward. This model minimizes usage of energy and computing resources under the underwater wireless networks, and learns and setting NAV values through intense learning. The results of the simulations show that NAV values can be environmentally adopted and select best value to the circumstances, so the problems which are unnecessary delay times and spacial inequality can be solved. Result of simulations, NAV time decreasing 17.5% compared with original NAV.

Analysis on the Secondary Pre-Physical Education Teacher's Recognition for the Learning Athletics Using the Q Methodology (Q방법론을 활용한 중등예비체육교사의 육상운동에 대한 인식 연구)

  • Yu, Young-Seol
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.4
    • /
    • pp.311-321
    • /
    • 2020
  • The purpose of this study was to analyze the recognition of secondary pre-physical education teachers' recognition for the learning athletics using Q methodology. P-sample was composed of 28 pre-secondary P·E teachers. The selected Q samples were arranged in the normal distribution form. The collected data were analyzed by factor analysis through varimax rotation using QUANL PC program. This study found four types of recognition on learning athletics. Type I is defined 'the type of recognition for education value.' Type II is defined 'the type of emphasizing assistant activities.' Type III is defined 'the type of an appeal difficulty to learn athletics skill.' Type IV is defined 'the type of emphasizing the basic movement value.' Based of the results of this study, the implications and direction to future research on athletics activities are suggested.

A Case Study on The Application of Team-Based Learning by Culinary Major University Students to Culinary Skills Subjects (조리실무과목에 대한 조리전공 대학생의 팀기반학습(TBL) 적용사례 연구)

  • Kim, Chan-Woo;Chung, Hyun-Chae
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.327-337
    • /
    • 2020
  • This study analyzed subjective cognitive types of culinary majors by applying TBL of cooking practice subjects, and applied Q methodology to multifaceted analysis of subjective cognitive types of learners. For the analysis of the study, interviews were conducted for college students majoring in cooking, and the survey was conducted in the order of constructing the Q population, selecting P-samples, classifying Q, interpreting the results, conclusions, and discussion. A total of four types were derived from the type analysis, and each was named according to its specificity as follows. Type 1 (N = 8): Cooperative Learning Effect Types, Type 2 (N = 8): Problem Solving Ability Effect Types, Type 3 (N = 6): Self Directed Learning Effect Type, Type 4 (N = 6): Individual Practice Preference Type analyzed for each unique feature type. It is expected that through the results of the study, it is expected to provide important implications that can help in the study of similar teaching methods in the future by fostering talents who can increase the needs of the industry and social stress.

Labeling Q-Learning for Maze Problems with Partially Observable States

  • Lee, Hae-Yeon;Hiroyuki Kamaya;Kenich Abe
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.489-489
    • /
    • 2000
  • Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.

  • PDF

Reinforcement Learning Approach for Resource Allocation in Cloud Computing (클라우드 컴퓨팅 환경에서 강화학습기반 자원할당 기법)

  • Choi, Yeongho;Lim, Yujin;Park, Jaesung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.4
    • /
    • pp.653-658
    • /
    • 2015
  • Cloud service is one of major challenges in IT industries. In cloud environment, service providers predict dynamic user demands and provision resources to guarantee the QoS to cloud users. The conventional prediction models guarantee the QoS to cloud user, but don't guarantee profit of service providers. In this paper, we propose a new resource allocation mechanism using Q-learning algorithm to provide the QoS to cloud user and guarantee profit of service providers. To evaluate the performance of our mechanism, we compare the total expense and the VM provisioning delay with the conventional techniques with real data.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.