• Title/Summary/Keyword: TD-learning

Search Result 29, Processing Time 0.026 seconds

A Study on the Portfolio Performance Evaluation using Actor-Critic Reinforcement Learning Algorithms (액터-크리틱 모형기반 포트폴리오 연구)

  • Lee, Woo Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.3
    • /
    • pp.467-476
    • /
    • 2022
  • The Bank of Korea raised the benchmark interest rate by a quarter percentage point to 1.75 percent per year, and analysts predict that South Korea's policy rate will reach 2.00 percent by the end of calendar year 2022. Furthermore, because market volatility has been significantly increased by a variety of factors, including rising rates, inflation, and market volatility, many investors have struggled to meet their financial objectives or deliver returns. Banks and financial institutions are attempting to provide Robo-Advisors to manage client portfolios without human intervention in this situation. In this regard, determining the best hyper-parameter combination is becoming increasingly important. This study compares some activation functions of the Deep Deterministic Policy Gradient(DDPG) and Twin-delayed Deep Deterministic Policy Gradient (TD3) Algorithms to choose a sequence of actions that maximizes long-term reward. The DDPG and TD3 outperformed its benchmark index, according to the results. One reason for this is that we need to understand the action probabilities in order to choose an action and receive a reward, which we then compare to the state value to determine an advantage. As interest in machine learning has grown and research into deep reinforcement learning has become more active, finding an optimal hyper-parameter combination for DDPG and TD3 has become increasingly important.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.

A study of Temperal Difference Learning using Nonlinear Function Approximation (비선형 함수 근사화를 사용한 TD학습에 관한 연구)

  • Kwon, Jae-Cheol;Lee, Young-Seog;Kim, Dong-Ok;Seo, Bo-Hyeok
    • Proceedings of the KIEE Conference
    • /
    • 1998.11b
    • /
    • pp.407-409
    • /
    • 1998
  • This paper deals with temporal-difference learning that is a method for approximating long-term future cost as a function of current state in knowlege-poor environment, a function approximator is used to approximate the mapping from state to future cost, a linear function approximator is limited because mapping from state to future cost has a nonlinear characteristic, so a nonlinear function approximator is used to approximate the mapping from state to future cost in this paper, and that TD learning using a nonlinear function approximator is stable is proved.

  • PDF

The Improvement of Convergence Rate in n-Queen Problem Using Reinforcement learning (강화학습을 이용한 n-Queen 문제의 수렴속도 향상)

  • Lim SooYeon;Son KiJun;Park SeongBae;Lee SangJo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.1
    • /
    • pp.1-5
    • /
    • 2005
  • The purpose of reinforcement learning is to maximize rewards from environment, and reinforcement learning agents learn by interacting with external environment through trial and error. Q-Learning, a representative reinforcement learning algorithm, is a type of TD-learning that exploits difference in suitability according to the change of time in learning. The method obtains the optimal policy through repeated experience of evaluation of all state-action pairs in the state space. This study chose n-Queen problem as an example, to which we apply reinforcement learning, and used Q-Learning as a problem solving algorithm. This study compared the proposed method using reinforcement learning with existing methods for solving n-Queen problem and found that the proposed method improves the convergence rate to the optimal solution by reducing the number of state transitions to reach the goal.

Capacitated Fab Scheduling Approximation using Average Reward TD(${\lambda}$) Learning based on System Feature Functions (시스템 특성함수 기반 평균보상 TD(${\lambda}$) 학습을 통한 유한용량 Fab 스케줄링 근사화)

  • Choi, Jin-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.4
    • /
    • pp.189-196
    • /
    • 2011
  • In this paper, we propose a logical control-based actor-critic algorithm as an efficient approach for the approximation of the capacitated fab scheduling problem. We apply the average reward temporal-difference learning method for estimating the relative value functions of system states, while avoiding deadlock situation by Banker's algorithm. We consider the Intel mini-fab re-entrant line for the evaluation of the suggested algorithm and perform a numerical experiment by generating some sample system configurations randomly. We show that the suggested method has a prominent performance compared to other well-known heuristics.

Enhancing VANET Security: Efficient Communication and Wormhole Attack Detection using VDTN Protocol and TD3 Algorithm

  • Vamshi Krishna. K;Ganesh Reddy K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.233-262
    • /
    • 2024
  • Due to the rapid evolution of vehicular ad hoc networks (VANETs), effective communication and security are now essential components in providing secure and reliable vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication. However, due to their dynamic nature and potential threats, VANETs need to have strong security mechanisms. This paper presents a novel approach to improve VANET security by combining the Vehicular Delay-Tolerant Network (VDTN) protocol with the Deep Reinforcement Learning (DRL) technique known as the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. A store-carry-forward method is used by the VDTN protocol to resolve the problems caused by inconsistent connectivity and disturbances in VANETs. The TD3 algorithm is employed for capturing and detecting Worm Hole Attack (WHA) behaviors in VANETs, thereby enhancing security measures. By combining these components, it is possible to create trustworthy and effective communication channels as well as successfully detect and stop rushing attacks inside the VANET. Extensive evaluations and simulations demonstrate the effectiveness of the proposed approach, enhancing both security and communication efficiency.

Function Approximation for Reinforcement Learning using Fuzzy Clustering (퍼지 클러스터링을 이용한 강화학습의 함수근사)

  • Lee, Young-Ah;Jung, Kyoung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.587-592
    • /
    • 2003
  • Many real world control problems have continuous states and actions. When the state space is continuous, the reinforcement learning problems involve very large state space and suffer from memory and time for learning all individual state-action values. These problems need function approximators that reason action about new state from previously experienced states. We introduce Fuzzy Q-Map that is a function approximators for 1 - step Q-learning and is based on fuzzy clustering. Fuzzy Q-Map groups similar states and chooses an action and refers Q value according to membership degree. The centroid and Q value of winner cluster is updated using membership degree and TD(Temporal Difference) error. We applied Fuzzy Q-Map to the mountain car problem and acquired accelerated learning speed.

Reinforcement Learning Using State Space Compression (상태 공간 압축을 이용한 강화학습)

  • Kim, Byeong-Cheon;Yun, Byeong-Ju
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.3
    • /
    • pp.633-640
    • /
    • 1999
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like Q-learning and TD(Temporal Difference)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present COMREL(COMpressed REinforcement Learning) algorithm for finding the shortest path fast in a maze environment, select the candidate states that can guide the shortest path in compressed maze environment, and learn only the candidate states to find the shortest path. After comparing COMREL algorithm with the already existing Q-learning and Priortized Sweeping algorithm, we could see that the learning time shortened very much.

  • PDF

Minimize Order Picking Time through Relocation of Products in Warehouse Based on Reinforcement Learning (물품 출고 시간 최소화를 위한 강화학습 기반 적재창고 내 물품 재배치)

  • Kim, Yeojin;Kim, Geuntae;Lee, Jonghwan
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.2
    • /
    • pp.90-94
    • /
    • 2022
  • In order to minimize the picking time when the products are released from the warehouse, they should be located close to the exit when the products are released. Currently, the warehouse determines the loading location based on the order of the requirement of products, that is, the frequency of arrival and departure. Items with lower requirement ranks are loaded away from the exit, and items with higher requirement ranks are loaded closer from the exit. This is a case in which the delivery time is faster than the products located near the exit, even if the products are loaded far from the exit due to the low requirement ranking. In this case, there is a problem in that the transit time increases when the product is released. In order to solve the problem, we use the idle time of the stocker in the warehouse to rearrange the products according to the order of delivery time. Temporal difference learning method using Q_learning control, which is one of reinforcement learning types, was used when relocating items. The results of rearranging the products using the reinforcement learning method were compared and analyzed with the results of the existing method.

Genetic Algorithm based Neural Network and Temporal Difference Learning: Janggi Board Game (유전자기반 신경회로망과 Temporal Difference학습: 장기보드게임)

  • 박인규
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.05c
    • /
    • pp.308-314
    • /
    • 2002
  • 본 논문은 2인용 보드게임의 정보에 대한 전략을 학습할 수 있는 방법을 유전자기반 역전파 신경회로망과 Temporal Difference학습알고리즘을 이용하여 제안하였다. 학습의 과정은 역전파에 의한 초기학습에 이어 국부해의 단점을 극복하기 위하여 미세학습으로 유전자알고리즘을 이용하였다. 시스템의 구성은 탐색을 담당하는 부분과 기물의 수를 발생하는 부분으로 구성되어 있다. 수의 발생부분은 보드의 상태에 따라서 갱신되고, 탐색커널은 αβ탐색을 기본으로 유전자알고리즘을 이용하여 가중치를 최적화하는 유전자기반 역전파 신경회로망과 TD학습을 결합하여 게임에 대해 양호한 평가함수를 학습하였다. 일반적으로 많은 학습을 통하여 평가함수의 정확도가 보장되면 승률이 학습의 양에 비례함을 알 수 있었다.

  • PDF