• Title/Summary/Keyword: reward function

Search Result 93, Processing Time 0.026 seconds

Deep Q-Network based Game Agents (심층 큐 신경망을 이용한 게임 에이전트 구현)

  • Han, Dongki;Kim, Myeongseop;Kim, Jaeyoun;Kim, Jung-Su
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.3
    • /
    • pp.157-162
    • /
    • 2019
  • The video game Tetris is one of most popular game and it is well known that its game rule can be modelled as MDP (Markov Decision Process). This paper presents a DQN (Deep Q-Network) based game agent for Tetris game. To this end, the state is defined as the captured image of the Tetris game board and the reward is designed as a function of cleared lines by the game agent. The action is defined as left, right, rotate, drop, and their finite number of combinations. In addition to this, PER (Prioritized Experience Replay) is employed in order to enhance learning performance. To train the network more than 500000 episodes are used. The game agent employs the trained network to make a decision. The performance of the developed algorithm is validated via not only simulation but also real Tetris robot agent which is made of a camera, two Arduinos, 4 servo motors, and artificial fingers by 3D printing.

Study on the Psychobiological Characteristics of Sasang Typology Based on the Type-Specific Pathophysiological Digestive Symptom (사상 소화기능 소증에 따른 체질별 생리심리 특성 연구)

  • Chae, Han;Kim, Sung Hye;Han, Seung Yoon;Lee, Sang Jae;Kim, Byung Joo;Kwon, Young Kyu;Lee, Soo Jin
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.28 no.4
    • /
    • pp.417-424
    • /
    • 2014
  • The purpose of this study was to analyze the psychobiological traits of each Sasang typology based on the Sasang Digestive function Inventory (SDFI) which measures the Sasang type-specific pathophysiological digestive symptom. The SDFI, Temperament and Character Inventory (TCI) and NEO-Personality Inventory (NEOPI) were measured with 199 College students. The correlation coefficient was measured with Pearson correlation among SDFI, TCI, and NEOPI. The influence of TCI, sex and age on SDFI and its subscales were analyzed with regression analysis. We also compared the psychobiological features between high and low SDFI score groups to elucidate its psychobiological profiles. There was significant correlation between SDFI and TCI Harm-Avoidance (r=-0.192, p<0.001). The SDFI subscales were showed to have significant correlations with subscales of NEOPI and TCI. The regression model with TCI can explain 8-16% of type-specific pathophysiological digestive symptoms. The low SDFI score group ($39{\pm}9.3$) has significantly (p=0.007) higher than the high SDFI group ($33.6{\pm}12.2$) in TCI Harm-Avoidance which is considered important for the gastrointestinal dysfunction and So-Eum type differentiation. We found that the TCI may explain the mechanism underneath the Sasang type-specific pathophysiological symptom. It was suggested that the TCI Reward-Dependence would be useful for the study on Tae-Eum Sasang type, and its clinical meanings were discussed in the pathophysiological perspectives.

A study on the impact of homestay sharing platform on guests' online comment willingness

  • Zou, Ji-Kai;Liang, Teng-Yue;Dong, Cui
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.321-331
    • /
    • 2020
  • The purpose of this study is to explore the impact of home stay platform on guests' willingness to comment online under the Shared home stay business model. Shared platform of home stay facility in addition to providing a variety of support services, help the landlord to the tenant do offline accommodation services, implementation, trading, will need to take some measures to actively promote the tenant groups to the landlord, the evaluation is objective, effective and sufficient number in order to better promote the sharing credit ecological establishment of home stay facility. In this study, consumers who have used the Shared home stay platform are taken as the research objects. The survey method adopts network questionnaire survey and Likert seven subscales. The statistical software SPSS24.0 program is used to process the data. Firstly, descriptive statistical analysis was conducted, followed by validity analysis and reliability analysis. After the reliability and validity of the questionnaire were determined, correlation analysis and regression analysis were used to verify the proposed hypothesis. The research results of this study are summarized as follows :(1) the usability of platform comment function, guest satisfaction and platform reward have a positive impact on the guest online comment willingness; (2) The credit mechanism of the platform has a positive regulating effect on the process of tenant satisfaction influencing tenant comment intention.

A Level System Design for Achievement-assessing of Serious Game (기능성게임의 성취도 평가를 위한 레벨시스템 설계)

  • Yoon, Seon-Jeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.9
    • /
    • pp.2038-2044
    • /
    • 2011
  • Serious games are selected by users according to the original goals such as education, treatment, training and so on. Therefore, those type of games are evaluated inside and outside the game about whether the goals are archived or not. Among quality test elements of serious game, assessment is about whether, in games, ability to verify goal achievement is included or not. In this paper, we examined the achievement-assessing function of serious game through several cases. Furthermore, to utilize for developing serious games for English learning, we designed a level system which achievement-assessing function is applied to. In this level system, we used 'competition and reward' as the core elements of game, and designed the system through simulation of which grades are level-designed along the user's English proficiency level based on notice of MEST(Ministry of Education, Science and Technology). This paper is expected to be useful reference for designing English learning game containing achievement assessing function.

The Effects of Open-ended Problems on Mathematical Creativity and Brain Function (개방형 문제 활용이 수학적 창의력과 뇌기능에 미치는 효과)

  • Kim, Sang-Jeong;Kwon, Young-Min;Bae, Jong-Soo
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.14 no.3
    • /
    • pp.723-744
    • /
    • 2010
  • The aim of this study was to find the effects of open-ended problems on mathematical creativity and brain function. In this study, one class of first grade students were allocated randomly into two groups. Each group solved different problems. The experimental group solved the open-ended problems and the comparison group solved the closed-problems. Mathematical creativity was tested by the paper test. And Brain function was tested by an EEG(electroencephalogram) tester. The results of this study are as follows. Firstly, this study analyzed how the open-ended problems are effective on mathematical creativity. This analysis showed that it had a meaningful influence on the mathematical creativity(p=0.46). Accordingly, we could find out that open-ended problems make the student connect the mathematical concept and idea and think variously. Secondly, this study analyzed the effect of open-ended problems on brain function. This analysis showed that it did not have a meaningful influence on the brain function(p=.073) statistically but the experimental group's evaluation was higher than comparison groups' at the post-test. It also had a meaningful influence on the brain attention quotient(left) (p=.007), attention quotient(right) (p=.023) and emotion tendency quotient(p=.025). As a result of such tests, we could find out that open-ended problems are effective on brain function, especially on the attention ability. With the use of the open-ended problems, students could show quick understanding and response. An emotion tendency is also developed in the process. Because various answers are accepted, the students gain an internal reward at the process of finding an answer. Putting the above results together, we could find that open-ended problem is effective on mathematical creativity and brain function.

  • PDF

Multi-Dimensional Reinforcement Learning Using a Vector Q-Net - Application to Mobile Robots

  • Kiguchi, Kazuo;Nanayakkara, Thrishantha;Watanabe, Keigo;Fukuda, Toshio
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.1
    • /
    • pp.142-148
    • /
    • 2003
  • Reinforcement learning is considered as an important tool for robotic learning in unknown/uncertain environments. In this paper, we propose an evaluation function expressed in a vector form to realize multi-dimensional reinforcement learning. The novel feature of the proposed method is that learning one behavior induces parallel learning of other behaviors though the objectives of each behavior are different. In brief, all behaviors watch other behaviors from a critical point of view. Therefore, in the proposed method, there is cross-criticism and parallel learning that make the multi-dimensional learning process more efficient. By ap-plying the proposed learning method, we carried out multi-dimensional evaluation (reward) and multi-dimensional learning simultaneously in one trial. A special neural network (Q-net), in which the weights and the output are represented by vectors, is proposed to realize a critic net-work for Q-learning. The proposed learning method is applied for behavior planning of mobile robots.

On the Bayesian Fecision Making Model of 2-Person Coordination Game (2인 조정게임의 베이지안 의사결정모형)

  • 김정훈;정민용
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.22 no.3
    • /
    • pp.113-143
    • /
    • 1997
  • Most of the conflict problems between 2 persons can be represented as a bi-matrix game, because player's utilities, in general, are non-zero sum and change according to the progress of game. In the bi-matrix game the equilibrium point set which satisfies the Pareto optimality can be a good bargaining or coordination solution. Under the condition of incomplete information about the risk attitudes of the players, the bargaining or coordination solution depends on additional elements, namely, the players' methods of making inferences when they reach a node in the extensive form of the game that is off the equilibrium path. So the investigation about the players' inference type and its effects on the solution is essential. In addition to that, the effect of an individual's aversion to risk on various solutions in conflict problems, as expressed in his (her) utility function, must be considered. Those kinds of incomplete information make decision maker Bayesian, since it is often impossible to get correct information for building a decision making model. In Baysian point of view, this paper represents an analytic frame for guessing and learning opponent's attitude to risk for getting better reward. As an example for that analytic frame. 2 persons'bi-matrix game is considered. This example explains that a bi-matrix game can be transformed into a kind of matrix game through the players' implicitly cooperative attitude and the need of arbitration.

  • PDF

Traffic Prediction based Multi-Stage Virtual Topology Reconfiguration Policy in Multi-wavelength Routed Optical Networks (다중 파장 광 네트워크 상에서 트래픽 예상 기법 기반 다단계 가상망 재구성 정책)

  • Lin Zhang;Lee, Kyung-hee;Youn, Chan-Hyun;Shim, Eun-Bo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8C
    • /
    • pp.729-740
    • /
    • 2002
  • This paper studies the issues arising in the virtual topology reconfiguration phase of Multi-wavelength Routed Optical Networks. This reconfiguration process means to change the virtual topology in response to the changing traffic patterns in the higher layer. We formulate the optimal reconfiguration policy as a multi-stage decision-making problem to maximize the expected reward and cost function over an infinite horizon. Then we propose a new heuristic algorithm based on node-exchange to reconfigure the virtual topology to meet the traffic requirement. To counter the continual approximation problem brought by heuristic approach, we take the traffic prediction into consideration. We further propose a new heuristic reconfiguration algorithm called Prediction based Multi-stage Reconfiguration approach to realize the optimal reconfiguration policy based on predicted traffic. Simulation results show that our reconfiguration policy significantly outperforms the conventional one, while the required physical resources are limited.

Formalizing the Design, Evaluation, and Analysis of Quality of Protection in Wireless Networks

  • Lim, Sun-Hee;Yun, Seung-Hwan;Lim, Jong-In;Yi, Ok-Yeon
    • Journal of Communications and Networks
    • /
    • v.11 no.6
    • /
    • pp.634-644
    • /
    • 2009
  • A diversity of wireless networks, with rapidly evolving wireless technology, are currently in service. Due to their innate physical layer vulnerability, wireless networks require enhanced security components. WLAN, WiBro, and UMTS have defined proper security components that meet standard security requirements. Extensive research has been conducted to enhance the security of individual wireless platforms, and we now have meaningful results at hand. However, with the advent of ubiquitous service, new horizontal platform service models with vertical crosslayer security are expected to be proposed. Research on synchronized security service and interoperability in a heterogeneous environment must be conducted. In heterogeneous environments, to design the balanced security components, quantitative evaluation model of security policy in wireless networks is required. To design appropriate evaluation method of security policies in heterogeneous wireless networks, we formalize the security properties in wireless networks. As the benefit of security protocols is indicated by the quality of protection (QoP), we improve the QoP model and evaluate hybrid security policy in heterogeneous wireless networks by applying to the QoP model. Deriving relative indicators from the positive impact of security points, and using these indicators to quantify a total reward function, this paper will help to assure the appropriate benchmark for combined security components in wireless networks.

Mean Field Game based Reinforcement Learning for Weapon-Target Assignment (평균 필드 게임 기반의 강화학습을 통한 무기-표적 할당)

  • Shin, Min Kyu;Park, Soon-Seo;Lee, Daniel;Choi, Han-Lim
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.23 no.4
    • /
    • pp.337-345
    • /
    • 2020
  • The Weapon-Target Assignment(WTA) problem can be formulated as an optimization problem that minimize the threat of targets. Existing methods consider the trade-off between optimality and execution time to meet the various mission objectives. We propose a multi-agent reinforcement learning algorithm for WTA based on mean field game to solve the problem in real-time with nearly optimal accuracy. Mean field game is a recent method introduced to relieve the curse of dimensionality in multi-agent learning algorithm. In addition, previous reinforcement learning models for WTA generally do not consider weapon interference, which may be critical in real world operations. Therefore, we modify the reward function to discourage the crossing of weapon trajectories. The feasibility of the proposed method was verified through simulation of a WTA problem with multiple targets in realtime and the proposed algorithm can assign the weapons to all targets without crossing trajectories of weapons.