• Title/Summary/Keyword: Potential-based 강화학습

Search Result 23, Processing Time 0.025 seconds

Potential-based Reinforcement Learning Combined with Case-based Decision Theory (사례 기반 결정 이론을 융합한 포텐셜 기반 강화 학습)

  • Kim, Eun-Sun;Chang, Hyeong-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.978-982
    • /
    • 2009
  • This paper proposes a potential-based reinforcement learning, called "RLs-CBDT", which combines multiple RL agents and case-base decision theory designed for decision making in uncertain environment as an expert knowledge in RL. We empirically show that RLs-CBDT converges to an optimal policy faster than pre-existing RL algorithms through a Tetris experiment.

A Dynamic Channel Assignment Method in Cellular Networks Using Reinforcement learning Method that Combines Supervised Knowledge (감독 지식을 융합하는 강화 학습 기법을 사용하는 셀룰러 네트워크에서 동적 채널 할당 기법)

  • Kim, Sung-Wan;Chang, Hyeong-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.502-506
    • /
    • 2008
  • The recently proposed "Potential-based" reinforcement learning (RL) method made it possible to combine multiple learnings and expert advices as supervised knowledge within an RL framework. The effectiveness of the approach has been established by a theoretical convergence guarantee to an optimal policy. In this paper, the potential-based RL method is applied to a dynamic channel assignment (DCA) problem in a cellular networks. It is empirically shown that the potential-based RL assigns channels more efficiently than fixed channel assignment, Maxavail, and Q-learning-based DCA, and it converges to an optimal policy more rapidly than other RL algorithms, SARSA(0) and PRQ-learning.

A Comparison Study on Reinforcement Learning Method that Combines Supervised Knowledge (감독 지식을 융합하는 강화 학습 기법들에 대한 비교 연구)

  • Kim, S.W.;Chang, H.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.303-308
    • /
    • 2007
  • 최근에 제안된 감독 지식을 융합하는 강화 학습 기법인 potential-based RL 기법의 효용성은 이론적 최적 정책으로의 수렴성 보장으로 증명되었고, policy-reuse RL 기법의 우수성은 감독지식을 융합하지 않는 기존의 강화학습과 실험적인 비교를 통하여 증명되었지만, policy-reuse RL 기법을 potential-based RL 기법과 비교한 연구는 아직까지 제시된 바가 없었다. 본 논문에서는 potential-based RL 기법과 policy-reuse RL 기법의 실험적인 성능 비교를 통하여 기법이 policy-reuse RL 기법이 policy-reuse RL 기법에 비하여 더 빠르게 수렴한다는 것을 보이며, 또한 policy-reuse RL 기법의 성능은 재사용하는 정책의 optimality에 영향을 받는다는 것을 보인다.

  • PDF

A Basic Research on the Development and Performance Evaluation of Evacuation Algorithm Based on Reinforcement Learning (강화학습 기반 피난 알고리즘 개발과 성능평가에 관한 기초연구)

  • Kwang-il Hwang;Byeol Kim
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.132-133
    • /
    • 2023
  • The safe evacuation of people during disasters is of utmost importance. Various life safety evacuation simulation tools have been developed and implemented, with most relying on algorithms that analyze maps to extract the shortest path and guide agents along predetermined routes. While effective in predicting evacuation routes in stable disaster conditions and short timeframes, this approach falls short in dynamic situations where disaster scenarios constantly change. Existing algorithms struggle to respond to such scenarios, prompting the need for a more adaptive evacuation route algorithm that can respond to changing disasters. Artificial intelligence technology based on reinforcement learning holds the potential to develop such an algorithm. As a fundamental step in algorithm development, this study aims to evaluate whether an evacuation algorithm developed by reinforcement learning satisfies the performance conditions of the evacuation simulation tool required by IMO MSC.1/Circ1533.

  • PDF

A Survey on Recent Advances in Multi-Agent Reinforcement Learning (멀티 에이전트 강화학습 기술 동향)

  • Yoo, B.H.;Ningombam, D.D.;Kim, H.W.;Song, H.J.;Park, G.M.;Yi, S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.6
    • /
    • pp.137-149
    • /
    • 2020
  • Several multi-agent reinforcement learning (MARL) algorithms have achieved overwhelming results in recent years. They have demonstrated their potential in solving complex problems in the field of real-time strategy online games, robotics, and autonomous vehicles. However these algorithms face many challenges when dealing with massive problem spaces in sparse reward environments. Based on the centralized training and decentralized execution (CTDE) architecture, the MARL algorithms discussed in the literature aim to solve the current challenges by formulating novel concepts of inter-agent modeling, credit assignment, multiagent communication, and the exploration-exploitation dilemma. The fundamental objective of this paper is to deliver a comprehensive survey of existing MARL algorithms based on the problem statements rather than on the technologies. We also discuss several experimental frameworks to provide insight into the use of these algorithms and to motivate some promising directions for future research.

A Study on Cathodic Protection Rectifier Control of City Gas Pipes using Deep Learning (딥러닝을 활용한 도시가스배관의 전기방식(Cathodic Protection) 정류기 제어에 관한 연구)

  • Hyung-Min Lee;Gun-Tek Lim;Guy-Sun Cho
    • Journal of the Korean Institute of Gas
    • /
    • v.27 no.2
    • /
    • pp.49-56
    • /
    • 2023
  • As AI (Artificial Intelligence)-related technologies are highly developed due to the 4th industrial revolution, cases of applying AI in various fields are increasing. The main reason is that there are practical limits to direct processing and analysis of exponentially increasing data as information and communication technology develops, and the risk of human error can be reduced by applying new technologies. In this study, after collecting the data received from the 'remote potential measurement terminal (T/B, Test Box)' and the output of the 'remote rectifier' at that time, AI was trained. AI learning data was obtained through data augmentation through regression analysis of the initially collected data, and the learning model applied the value-based Q-Learning model among deep reinforcement learning (DRL) algorithms. did The AI that has completed data learning is put into the actual city gas supply area, and based on the received remote T/B data, it is verified that the AI responds appropriately, and through this, AI can be used as a suitable means for electricity management in the future. want to verify.

Proximal Policy Optimization Reinforcement Learning based Optimal Path Planning Study of Surion Agent against Enemy Air Defense Threats (근접 정책 최적화 기반의 적 대공 방어 위협하 수리온 에이전트의 최적 기동경로 도출 연구)

  • Jae-Hwan Kim;Jong-Hwan Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.33 no.2
    • /
    • pp.37-44
    • /
    • 2024
  • The Korean Helicopter Development Program has successfully introduced the Surion helicopter, a versatile multi-domain operational aircraft that replaces the aging UH-1 and 500MD helicopters. Specifically designed for maneuverability, the Surion plays a crucial role in low-altitude tactical maneuvers for personnel transportation and specific missions, emphasizing the helicopter's survivability. Despite the significance of its low-altitude tactical maneuver capability, there is a notable gap in research focusing on multi-mission tactical maneuvers that consider the risk factors associated with deploying the Surion in the presence of enemy air defenses. This study addresses this gap by exploring a method to enhance the Surion's low-altitude maneuvering paths, incorporating information about enemy air defenses. Leveraging the Proximal Policy Optimization (PPO) algorithm, a reinforcement learning-based approach, the research aims to optimize the helicopter's path planning. Visualized experiments were conducted using a Surion model implemented in the Unity environment and ML-Agents library. The proposed method resulted in a rapid and stable policy convergence for generating optimal maneuvering paths for the Surion. The experiments, based on two key criteria, "operation time" and "minimum damage," revealed distinct optimal paths. This divergence suggests the potential for effective tactical maneuvers in low-altitude situations, considering the risk factors associated with enemy air defenses. Importantly, the Surion's capability for remote control in all directions enhances its adaptability in complex operational environments.

Understanding of Generative Artificial Intelligence Based on Textual Data and Discussion for Its Application in Science Education (텍스트 기반 생성형 인공지능의 이해와 과학교육에서의 활용에 대한 논의)

  • Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.307-319
    • /
    • 2023
  • This study aims to explain the key concepts and principles of text-based generative artificial intelligence (AI) that has been receiving increasing interest and utilization, focusing on its application in science education. It also highlights the potential and limitations of utilizing generative AI in science education, providing insights for its implementation and research aspects. Recent advancements in generative AI, predominantly based on transformer models consisting of encoders and decoders, have shown remarkable progress through optimization of reinforcement learning and reward models using human feedback, as well as understanding context. Particularly, it can perform various functions such as writing, summarizing, keyword extraction, evaluation, and feedback based on the ability to understand various user questions and intents. It also offers practical utility in diagnosing learners and structuring educational content based on provided examples by educators. However, it is necessary to examine the concerns regarding the limitations of generative AI, including the potential for conveying inaccurate facts or knowledge, bias resulting from overconfidence, and uncertainties regarding its impact on user attitudes or emotions. Moreover, the responses provided by generative AI are probabilistic based on response data from many individuals, which raises concerns about limiting insightful and innovative thinking that may offer different perspectives or ideas. In light of these considerations, this study provides practical suggestions for the positive utilization of AI in science education.

Smartphone-User Interactive based Self Developing Place-Time-Activity Coupled Prediction Method for Daily Routine Planning System (일상생활 계획을 위한 스마트폰-사용자 상호작용 기반 지속 발전 가능한 사용자 맞춤 위치-시간-행동 추론 방법)

  • Lee, Beom-Jin;Kim, Jiseob;Ryu, Je-Hwan;Heo, Min-Oh;Kim, Joo-Seuk;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.154-159
    • /
    • 2015
  • Over the past few years, user needs in the smartphone application market have been shifted from diversity toward intelligence. Here, we propose a novel cognitive agent that plans the daily routines of users using the lifelog data collected by the smart phones of individuals. The proposed method first employs DPGMM (Dirichlet Process Gaussian Mixture Model) to automatically extract the users' POI (Point of Interest) from the lifelog data. After extraction, the POI and other meaningful features such as GPS, the user's activity label extracted from the log data is then used to learn the patterns of the user's daily routine by POMDP (Partially Observable Markov Decision Process). To determine the significant patterns within the user's time dependent patterns, collaboration was made with the SNS application Foursquare to record the locations visited by the user and the activities that the user had performed. The method was evaluated by predicting the daily routine of seven users with 3300 feedback data. Experimental results showed that daily routine scheduling can be established after seven days of lifelogged data and feedback data have been collected, demonstrating the potential of the new method of place-time-activity coupled daily routine planning systems in the intelligence application market.

Enhancing Technology Learning Capabilities for Catch-up and Post Catch-up Innovations (기술학습역량 강화를 통한 추격 및 탈추격 혁신 촉진)

  • Bae, Zong-Tae;Lee, Jong-Seon;Koo, Bonjin
    • The Journal of Small Business Innovation
    • /
    • v.19 no.2
    • /
    • pp.53-68
    • /
    • 2016
  • Motivation and activities for technological learning, entrepreneurship, innovation, and creativity are driving forces of economic development in Asian countries. In the early stages of technological development, technological learning and entrepreneurship are efficient ways in which to catch up with advanced countries because firms can accumulate skills and knowledge quickly at relatively low risk. In the later stages of technological development, however, innovation and creativity become more important. This study aims to identify a) the factors (learning capabilities) that influence technological learning performance and b) barriers to enhancing innovation capabilities for the creative economy and organizations. The major part of this study is related to learning capabilities in the post-catch-up era. Based on a literature review and observations from Korean experiences, this study proposes a technological learning model composed of various influencing factors on technological learning. Three hypotheses are derived, and data are collected from Korean machine tool manufacturers. Intense interviews with CEOs and R&D directors are conducted using structured questionnaires. Statistical analysis, such as correlation and ANOVA are then carried out. Furthermore, this study addresses how to enhance innovation capabilities to move forward. Innovation enablers and barriers are identified by case studies and policy analysis. The results of the empirical study identify several levels of firms' learning capabilities and activities such as a) stock of technology, b) potential of technical labor, c) explicit technological efforts, d) readiness to learn, e) top management support, f) a formal technological learning system, g) high learning motivation, h) appropriate technology choice, and i) specific goal setting. These learning capabilities determine firms' learning performance, especially in the early stages of development. Furthermore, it is found that the critical factors for successful technological learning vary along the stages of technology development. Throughout the statistical and policy analyses, this study confirms that technological learning can be understood as an intrinsic principle of the technology development process. Firms perform proactive and creative learning in the late stages, while reactive and imitative learning prevails in the early stages. In addition, this study identifies the driving forces or facilitating factors enhancing innovation performance in the post catch-up era. The results of the preliminary case studies and policy analysis show some facilitating factors such as a) the strategic intent of the CEO and corporate culture, b) leadership and change agents, c) design principles and routines, d) ecosystem and collaboration with partners, and e) intensive R&D investment.

  • PDF