• 제목/요약/키워드: reward function

검색결과 94건 처리시간 0.026초

Triple-state 보상 함수를 기반으로 한 개선된 DSA 기법 (An Improved DSA Strategy based on Triple-States Reward Function)

  • 타사미아;구준롱;장성진;김재명
    • 대한전자공학회논문지TC
    • /
    • 제47권11호
    • /
    • pp.59-68
    • /
    • 2010
  • 본 논문은 보상함수 수정을 통해 보다 완벽한 DSA(Dynamic Spectrum Access)를 수행하는 새로운 방법을 제시하였다. POMDP(Partially Observable Markov Decision Process)는 미래의 스펙트럼 상태를 예측하는데 사용되는 알고리즘으로서, 그 중 보상함수는 스펙트럼을 예측하는데 있어 가장 중요한 부분이다. 그러나 보상함수는 Busy 및 Idle의 두 가지 상태만 갖고 있기 때문에 채널에서 충돌이 발생하게 되면 보상함수는 Busy를 반환함으로써 2차 사용자의 성능을 감소시키게 된다. 따라서 본 논문에서는 기존의 Busy를 Busy 및 Collision 의 두 상태로 구분하였고, 이렇게 추가된 Collision 상태를 통해 2차 사용자의 채널 접근 기회를 보다 향상시킴으로서 데이터 전송율을 증대시킬 수 있도록 하였다. 또한 본 논문은 새로운 알고리즘의 신뢰도 벡터를 수학적으로 분석하였다. 마지막으로 시뮬레이션 결과를 통해 개선된 보상함수의 성능을 검증하고, 이를 통해 새로운 알고리즘이 CR 네트워크에서 2차 사용자의 성능을 향상시킬 수 있음을 보인다.

시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작 (Visual Object Manipulation Based on Exploration Guided by Demonstration)

  • 김두준;조현준;송재복
    • 로봇학회논문지
    • /
    • 제17권1호
    • /
    • pp.40-47
    • /
    • 2022
  • A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.

종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계 (On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance)

  • 조성빈;정한유
    • 전기전자학회논문지
    • /
    • 제25권4호
    • /
    • pp.728-734
    • /
    • 2021
  • 최근 심층강화학습을 활용한 종단간 자율주행에 대한 관심이 크게 증가하고 있다. 본 논문에서는 차량의 종방향 주행 성능을 개선하는 잠재 SAC 기반 심층강화학습의 보상함수를 제시한다. 기존 강화학습 보상함수는 주행 안전성과 효율성이 크게 저하되는 반면 제시하는 보상함수는 전방 차량과의 충돌위험을 회피하면서 적절한 차간거리를 유지할 수 있음을 보인다.

다중 교차로에서 협동적 신호제어를 위한 보상함수 설계 (Designing Reward Function for Cooperative Traffic Signal Control at Multi-intersection)

  • 배요한;장진헌;송문혁
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2022년도 추계학술대회
    • /
    • pp.110-113
    • /
    • 2022
  • 신호를 제어하는 방식은기존의 전통적인 수학적 방식을 이용한 최적화를 넘어 이제 인공지능이 본격적으로 활용되기 시작하는 단계까지 발전하였다. 이에 따라 인공지능을 적용하는 방안에 대해 다양한 연구들이 진행되고 있는데, 현행 연구에서는 주로 좋은 교통 상황에 대한 마땅한 고려 없이 간단히 지체도만을 고려하여 보상함수를 설정하는 방식을 주로 채택하고 있다. 그러나 이 경우 현실성이 떨어지는 신호 제어 방식을 인공지능이 학습할 가능성이 존재한다는 문제점을 지닐 뿐더러, 보상 함수에서 좋다고 평가하는 것이 실질적인 서비스 수준의 정의에 부합하지 않음을 확인할 수 있다. 따라서 본 연구에서는 기존의 보상함수 설정 사례를 분석하고, 개선 방향을 제시하고자 한다.

  • PDF

Comparative analysis of activation functions within reinforcement learning for autonomous vehicles merging onto highways

  • Dongcheul Lee;Janise McNair
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권1호
    • /
    • pp.63-71
    • /
    • 2024
  • Deep reinforcement learning (RL) significantly influences autonomous vehicle development by optimizing decision-making and adaptation to complex driving environments through simulation-based training. In deep RL, an activation function is used, and various activation functions have been proposed, but their performance varies greatly depending on the application environment. Therefore, finding the optimal activation function according to the environment is important for effective learning. In this paper, we analyzed nine commonly used activation functions for RL to compare and evaluate which activation function is most effective when using deep RL for autonomous vehicles to learn highway merging. To do this, we built a performance evaluation environment and compared the average reward of each activation function. The results showed that the highest reward was achieved using Mish, and the lowest using SELU. The difference in reward between the two activation functions was 10.3%.

Weight Adjustment Scheme Based on Hop Count in Q-routing for Software Defined Networks-enabled Wireless Sensor Networks

  • Godfrey, Daniel;Jang, Jinsoo;Kim, Ki-Il
    • Journal of information and communication convergence engineering
    • /
    • 제20권1호
    • /
    • pp.22-30
    • /
    • 2022
  • The reinforcement learning algorithm has proven its potential in solving sequential decision-making problems under uncertainties, such as finding paths to route data packets in wireless sensor networks. With reinforcement learning, the computation of the optimum path requires careful definition of the so-called reward function, which is defined as a linear function that aggregates multiple objective functions into a single objective to compute a numerical value (reward) to be maximized. In a typical defined linear reward function, the multiple objectives to be optimized are integrated in the form of a weighted sum with fixed weighting factors for all learning agents. This study proposes a reinforcement learning -based routing protocol for wireless sensor network, where different learning agents prioritize different objective goals by assigning weighting factors to the aggregated objectives of the reward function. We assign appropriate weighting factors to the objectives in the reward function of a sensor node according to its hop-count distance to the sink node. We expect this approach to enhance the effectiveness of multi-objective reinforcement learning for wireless sensor networks with a balanced trade-off among competing parameters. Furthermore, we propose SDN (Software Defined Networks) architecture with multiple controllers for constant network monitoring to allow learning agents to adapt according to the dynamics of the network conditions. Simulation results show that our proposed scheme enhances the performance of wireless sensor network under varied conditions, such as the node density and traffic intensity, with a good trade-off among competing performance metrics.

Reward Shaping for a Reinforcement Learning Method-Based Navigation Framework

  • Roland, Cubahiro;Choi, Donggyu;Jang, Jongwook
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2022년도 추계학술대회
    • /
    • pp.9-11
    • /
    • 2022
  • Applying Reinforcement Learning in everyday applications and varied environments has proved the potential of the of the field and revealed pitfalls along the way. In robotics, a learning agent takes over gradually the control of a robot by abstracting the navigation model of the robot with its inputs and outputs, thus reducing the human intervention. The challenge for the agent is how to implement a feedback function that facilitates the learning process of an MDP problem in an environment while reducing the time of convergence for the method. In this paper we will implement a reward shaping system avoiding sparse rewards which gives fewer data for the learning agent in a ROS environment. Reward shaping prioritizes behaviours that brings the robot closer to the goal by giving intermediate rewards and helps the algorithm converge quickly. We will use a pseudocode implementation as an illustration of the method.

  • PDF

STAD학습에서 복합보상이 학업성취도와 학습태도에 미치는 효과 (The Effect of the Complex Reward in STAD Learning on Academic Achievement and Learning Attitudes)

  • 김선수;최도성
    • 한국초등과학교육학회지:초등과학교육
    • /
    • 제21권1호
    • /
    • pp.101-109
    • /
    • 2002
  • A cooperative teaming has been taken to consolidate the autonomous motivation of students and to develop a desirable attitude in a mutual cooperative atmosphere. Some studies on the reward effect showed that the reward after the evaluation, in the processes of cooperative learning, worked on students' learning motive directly, and the group reward was effective in learning attitude and the individual reward in academic achievement, respectively. Assuming that the group reward and the individual reward are organized and applied as a complex reward, the effects of rewards will appear, this study examined the effect of the complex reward on academic achievement and teaming attitude. For this study. 2 classes were randomly selected out of a elementary school in Gwangju and the teaming unit was based on chapter 4「The structure and function of plants」 in the 5-1 elementary Science textbook. This research has been done for 4 weeks after the students learned STAD for 8 weeks previously. The learning attitude was examined in pre and post tests, and the academic achievement was inspected twice at 2-week intervals after the pre test. The results were analysized by the SAS program In the case of academic achievement, both groups showed a significant improvement(p<.05). The experimental group showed no significant improvement in the first test, compared with the control group(p>.05), but after 4 weeks, it showed a significant improvement in the second test, compared with the control group(p<.05). From this result, it is identified that the reward should be done for a long time and the individual reward of the complex reward is successful in improving academic achievement. However, in the case of learning attitude, there was no meaningful difference in both groups(p>.05). But the control group showed a significant improvement, compared with the experimental group(p<.05). According to this result, it is indicated that the group reward only is more effective in improving learning attitude and complex reward can decrease the individual competition in experimental group.

  • PDF

Exploring reward efficacy in traffic management using deep reinforcement learning in intelligent transportation system

  • Paul, Ananya;Mitra, Sulata
    • ETRI Journal
    • /
    • 제44권2호
    • /
    • pp.194-207
    • /
    • 2022
  • In the last decade, substantial progress has been achieved in intelligent traffic control technologies to overcome consistent difficulties of traffic congestion and its adverse effect on smart cities. Edge computing is one such advanced progress facilitating real-time data transmission among vehicles and roadside units to mitigate congestion. An edge computing-based deep reinforcement learning system is demonstrated in this study that appropriately designs a multiobjective reward function for optimizing different objectives. The system seeks to overcome the challenge of evaluating actions with a simple numerical reward. The selection of reward functions has a significant impact on agents' ability to acquire the ideal behavior for managing multiple traffic signals in a large-scale road network. To ascertain effective reward functions, the agent is trained withusing the proximal policy optimization method in several deep neural network models, including the state-of-the-art transformer network. The system is verified using both hypothetical scenarios and real-world traffic maps. The comprehensive simulation outcomes demonstrate the potency of the suggested reward functions.

Novel Reward Function for Autonomous Drone Navigating in Indoor Environment

  • Khuong G. T. Diep;Viet-Tuan Le;Tae-Seok Kim;Anh H. Vo;Yong-Guk Kim
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 추계학술발표대회
    • /
    • pp.624-627
    • /
    • 2023
  • Unmanned aerial vehicles are gaining in popularity with the development of science and technology, and are being used for a wide range of purposes, including surveillance, rescue, delivery of goods, and data collection. In particular, the ability to avoid obstacles during navigation without human oversight is one of the essential capabilities that a drone must possess. Many works currently have solved this problem by implementing deep reinforcement learning (DRL) model. The essential core of a DRL model is reward function. Therefore, this paper proposes a new reward function with appropriate action space and employs dueling double deep Q-Networks to train a drone to navigate in indoor environment without collision.