• Title/Summary/Keyword: 학습강화

Search Result 1,589, Processing Time 0.032 seconds

A Study about the Usefulness of Reinforcement Learning in Business Simulation Games using PPO Algorithm (경영 시뮬레이션 게임에서 PPO 알고리즘을 적용한 강화학습의 유용성에 관한 연구)

  • Liang, Yi-Hong;Kang, Sin-Jin;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.6
    • /
    • pp.61-70
    • /
    • 2019
  • In this paper, we apply reinforcement learning in the field of management simulation game to check whether game agents achieve autonomously given goal. In this system, we apply PPO (Proximal Policy Optimization) algorithm in the Unity Machine Learning (ML) Agent environment and the game agent is designed to automatically find a way to play. Five game scenario simulation experiments were conducted to verify their usefulness. As a result, it was confirmed that the game agent achieves the goal through learning despite the change of environment variables in the game.

Robot Control via RPO-based Reinforcement Learning Algorithm (RPO 기반 강화학습 알고리즘을 이용한 로봇제어)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.505-510
    • /
    • 2005
  • The RPO(randomized policy optimizer) algorithm, which utilizes probabilistic policy for the action selection, is a recently developed tool in the area of reinforcement learning, and has been shown to be very successful in several application problems. In this paper, we propose a modified RPO algorithm, whose critic network is adapted via RLS(Recursive Least Square) algorithm. In order to illustrate the applicability of the modified RPO method, we applied the modified algorithm to Kimura's robot and observed very good performance. We also developed a MATLAB-based animation program, by which the effectiveness of the training algorithms on the acceleration or the robot movement were observed.

Generation of Ship's Optimal Route based on Q-Learning (Q-러닝 기반의 선박의 최적 경로 생성)

  • Hyeong-Tak Lee;Min-Kyu Kim;Hyun Yang
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.160-161
    • /
    • 2023
  • Currently, the ship's passage planning relies on the navigator officer's knowledge and empirical methods. However, as ship autonomous navigation technology has recently developed, automation technology for passage planning has been studied in various ways. In this study, we intend to generate an optimal route for a ship based on Q-learning, one of the reinforcement learning techniques. Reinforcement learning is applied in a way that trains experiences for various situations and makes optimal decisions based on them.

  • PDF

Enhancing Technology Learning Capabilities for Catch-up and Post Catch-up Innovations (기술학습역량 강화를 통한 추격 및 탈추격 혁신 촉진)

  • Bae, Zong-Tae;Lee, Jong-Seon;Koo, Bonjin
    • The Journal of Small Business Innovation
    • /
    • v.19 no.2
    • /
    • pp.53-68
    • /
    • 2016
  • Motivation and activities for technological learning, entrepreneurship, innovation, and creativity are driving forces of economic development in Asian countries. In the early stages of technological development, technological learning and entrepreneurship are efficient ways in which to catch up with advanced countries because firms can accumulate skills and knowledge quickly at relatively low risk. In the later stages of technological development, however, innovation and creativity become more important. This study aims to identify a) the factors (learning capabilities) that influence technological learning performance and b) barriers to enhancing innovation capabilities for the creative economy and organizations. The major part of this study is related to learning capabilities in the post-catch-up era. Based on a literature review and observations from Korean experiences, this study proposes a technological learning model composed of various influencing factors on technological learning. Three hypotheses are derived, and data are collected from Korean machine tool manufacturers. Intense interviews with CEOs and R&D directors are conducted using structured questionnaires. Statistical analysis, such as correlation and ANOVA are then carried out. Furthermore, this study addresses how to enhance innovation capabilities to move forward. Innovation enablers and barriers are identified by case studies and policy analysis. The results of the empirical study identify several levels of firms' learning capabilities and activities such as a) stock of technology, b) potential of technical labor, c) explicit technological efforts, d) readiness to learn, e) top management support, f) a formal technological learning system, g) high learning motivation, h) appropriate technology choice, and i) specific goal setting. These learning capabilities determine firms' learning performance, especially in the early stages of development. Furthermore, it is found that the critical factors for successful technological learning vary along the stages of technology development. Throughout the statistical and policy analyses, this study confirms that technological learning can be understood as an intrinsic principle of the technology development process. Firms perform proactive and creative learning in the late stages, while reactive and imitative learning prevails in the early stages. In addition, this study identifies the driving forces or facilitating factors enhancing innovation performance in the post catch-up era. The results of the preliminary case studies and policy analysis show some facilitating factors such as a) the strategic intent of the CEO and corporate culture, b) leadership and change agents, c) design principles and routines, d) ecosystem and collaboration with partners, and e) intensive R&D investment.

  • PDF

A Study on the Development and Validation of Home Economics Teaching-Learning Materials for Critical Multicultural Education : Focusing on Media Literacy (비판적 다문화교육을 위한 가정과 교수.학습 자료 개발 및 타당화 연구 : 미디어 리터러시를 중심으로)

  • Kim, Seo-Hyun;Chin, Mee-Jung
    • Journal of Korean Home Economics Education Association
    • /
    • v.24 no.3
    • /
    • pp.1-34
    • /
    • 2012
  • The objectives of this study are to introduce a critical perspective of the multicultural education in home economics education, to develop teaching-learning materials and to apply them in classes for the purpose of enhancing students' multicultural competence for a validity test. For these purposes, family life culture sections from six high school technology and home economics textbooks were analyzed based on the contents and elements of multicultural competence. After recomposing the family life culture sections, this study developed 12-session teaching-learning materials with an emphasis on media literacy. Among them, 4-session plans were taught in classrooms for 247 students in the 10th grades. To test the validity of the plans, a questionnaires was given to the students as a pre- and post-test. The data were analyzed with paired t-tests. The results showed significant pre and post differences in all sections of multicultural competence except the section of 'general cultural understanding'. This implied that the developed teaching-learning materials were effective in helping students overcome ethnocentrism and enhance the understanding of cultural differences.

  • PDF

Trading Strategies Using Reinforcement Learning (강화학습을 이용한 트레이딩 전략)

  • Cho, Hyunmin;Shin, Hyun Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.123-130
    • /
    • 2021
  • With the recent developments in computer technology, there has been an increasing interest in the field of machine learning. This also has led to a significant increase in real business cases of machine learning theory in various sectors. In finance, it has been a major challenge to predict the future value of financial products. Since the 1980s, the finance industry has relied on technical and fundamental analysis for this prediction. For future value prediction models using machine learning, model design is of paramount importance to respond to market variables. Therefore, this paper quantitatively predicts the stock price movements of individual stocks listed on the KOSPI market using machine learning techniques; specifically, the reinforcement learning model. The DQN and A2C algorithms proposed by Google Deep Mind in 2013 are used for the reinforcement learning and they are applied to the stock trading strategies. In addition, through experiments, an input value to increase the cumulative profit is selected and its superiority is verified by comparison with comparative algorithms.

Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policies via Object Manipulation with an Anthropomorphic Robot Hand (휴먼형 로봇 손의 사물 조작 수행을 이용한 사람 데모 결합 강화학습 정책 성능 평가)

  • Park, Na Hyeon;Oh, Ji Heon;Ryu, Ga Hyun;Lopez, Patricio Rivera;Anazco, Edwin Valarezo;Kim, Tae Seong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.5
    • /
    • pp.179-186
    • /
    • 2021
  • Manipulation of complex objects with an anthropomorphic robot hand like a human hand is a challenge in the human-centric environment. In order to train the anthropomorphic robot hand which has a high degree of freedom (DoF), human demonstration augmented deep reinforcement learning policy optimization methods have been proposed. In this work, we first demonstrate augmentation of human demonstration in deep reinforcement learning (DRL) is effective for object manipulation by comparing the performance of the augmentation-free Natural Policy Gradient (NPG) and Demonstration Augmented NPG (DA-NPG). Then three DRL policy optimization methods, namely NPG, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO), have been evaluated with DA (i.e., DA-NPG, DA-TRPO, and DA-PPO) and without DA by manipulating six objects such as apple, banana, bottle, light bulb, camera, and hammer. The results show that DA-NPG achieved the average success rate of 99.33% whereas NPG only achieved 60%. In addition, DA-NPG succeeded grasping all six objects while DA-TRPO and DA-PPO failed to grasp some objects and showed unstable performances.

Optimal deployment of sonobuoy for unmanned aerial vehicles using reinforcement learning considering the target movement (표적의 이동을 고려한 강화학습 기반 무인항공기의 소노부이 최적 배치)

  • Geunyoung Bae;Juhwan Kang;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.214-224
    • /
    • 2024
  • Sonobuoys are disposable devices that utilize sound waves for information gathering, detecting engine noises, and capturing various acoustic characteristics. They play a crucial role in accurately detecting underwater targets, making them effective detection systems in anti-submarine warfare. Existing sonobuoy deployment methods in multistatic systems often rely on fixed patterns or heuristic-based rules, lacking efficiency in terms of the number of sonobuoys deployed and operational time due to the unpredictable mobility of the underwater targets. Thus, this paper proposes an optimal sonobuoy placement strategy for Unmanned Aerial Vehicles (UAVs) to overcome the limitations of conventional sonobuoy deployment methods. The proposed approach utilizes reinforcement learning in a simulation-based experimental environment that considers the movements of the underwater targets. The Unity ML-Agents framework is employed, and the Proximal Policy Optimization (PPO) algorithm is utilized for UAV learning in a virtual operational environment with real-time interactions. The reward function is designed to consider the number of sonobuoys deployed and the cost associated with sound sources and receivers, enabling effective learning. The proposed reinforcement learning-based deployment strategy compared to the conventional sonobuoy deployment methods in the same experimental environment demonstrates superior performance in terms of detection success rate, deployed sonobuoy count, and operational time.

Fuzzy Q-learning using Distributed Eligibility (분포 기여도를 이용한 퍼지 Q-learning)

  • 정석일;이연정
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.388-394
    • /
    • 2001
  • Reinforcement learning is a kind of unsupervised learning methods that an agent control rules from experiences acquired by interactions with environment. The eligibility is used to resolve the credit-assignment problem which is one of important problems in reinforcement learning, Conventional eligibilities such as the accumulating eligibility and the replacing eligibility are ineffective in use of rewards acquired in learning process, since on1y one executed action for a visited state is learned. In this paper, we propose a new eligibility, called the distributed eligibility, with which not only an executed action but also neighboring actions in a visited state are to be learned. The fuzzy Q-learning algorithm using the proposed eligibility is applied to a cart-pole balancing problem, which shows the superiority of the proposed method to conventional methods in terms of learning speed.

  • PDF

Behavior Learning and Evolution of Swarm Robot System using Q-learning and Cascade SVM (Q-learning과 Cascade SVM을 이용한 군집로봇의 행동학습 및 진화)

  • Seo, Sang-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.2
    • /
    • pp.279-284
    • /
    • 2009
  • In swarm robot systems, each robot must behaves by itself according to the its states and environments, and if necessary, must cooperates with other robots in order to carry out a given task. Therefore it is essential that each robot has both learning and evolution ability to adapt the dynamic environments. In this paper, reinforcement learning method using many SVM based on structural risk minimization and distributed genetic algorithms is proposed for behavior learning and evolution of collective autonomous mobile robots. By distributed genetic algorithm exchanging the chromosome acquired under different environments by communication each robot can improve its behavior ability. Specially, in order to improve the performance of evolution, selective crossover using the characteristic of reinforcement learning that basis of Cascade SVM is adopted in this paper.