• 제목/요약/키워드: Path of Reinforcement

검색결과 135건 처리시간 0.023초

POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법 (A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques)

  • 한정수
    • 한국통신학회논문지
    • /
    • 제31권3B호
    • /
    • pp.175-182
    • /
    • 2006
  • 본 논문에서는 Localized Aptive QoS 라우팅을 위해 POMDP(Partially Observable Markov Decision Processes)와 Exploration Bonus 기법을 사용하는 방법을 제안하였다. 또한, POMDP 문제를 해결하기 위해 Dynamic Programming을 사용하여 최적의 행동을 찾는 연산이 매우 복잡하고 어렵기 때문에 CEA(Certainty Equivalency Approximation) 기법을 통한 기댓값 사용으로 문제를 단순하였으며, Exploration Bonus 방식을 사용해 현재 경로보다 나은 경로를 탐색하고자 하였다. 이를 위해 다중 경로 탐색 알고리즘(SEMA)을 제안했다. 더욱이 탐색의 횟수와 간격을 정의하기 위해 $\phi$와 k 성능 파라미터들을 사용하여 이들을 통해 탐색의 횟수 변화를 통한 서비스 성공률과 성공 시 사용된 평균 홉 수에 대한 성능을 살펴보았다. 결과적으로 $\phi$ 값이 증가함에 따라 현재의 경로보다 더 나은 경로를 찾게 되며, k 값이 증가할수록 탐색이 증가함을 볼 수 있다.

자율이동체의 주행 시험을 위한 선분과 원호로 이루어진 경로 자동 생성 방법 (A method for automatically generating a route consisting of line segments and arcs for autonomous vehicle driving test)

  • 조세형
    • 전기전자학회논문지
    • /
    • 제27권1호
    • /
    • pp.1-11
    • /
    • 2023
  • 자율주행 자동차 또는 자율주행 로봇의 개발을 위해서는 경로 주행 시험이 필요하다. 이러한 시험은 실제 환경뿐만 아니라 시뮬레이션 환경에서도 수행되고 있다. 특히 강화학습과 딥러닝을 이용한 개발을 위해서 다양한 환경의 데이터가 필요한 경우에 시뮬레이터를 통한 개발도 이루어지고 있다. 이를 위해서는 수작업으로 설계된 경로뿐만 아니라 무작위로 자동으로 설계된 다양한 경로의 활용이 필요하다. 이러한 시험장 설계는 실제 건설, 제작에도 활용할 수 있다. 본 논문에서는 원호와 선분의 조합으로 이루어진 주행 시험 경로를 무작위로 생성하는 방법을 소개한다. 이는 원호와 선분의 거리를 구하여 충돌 여부를 판별하는 방법과 경로를 계속해서 이어 나가는 것이 불가능할 경우 경로 일부를 삭제하고 적절한 경로를 다시 만들어 나가는 알고리듬으로 이루어진다.

Weight Adjustment Scheme Based on Hop Count in Q-routing for Software Defined Networks-enabled Wireless Sensor Networks

  • Godfrey, Daniel;Jang, Jinsoo;Kim, Ki-Il
    • Journal of information and communication convergence engineering
    • /
    • 제20권1호
    • /
    • pp.22-30
    • /
    • 2022
  • The reinforcement learning algorithm has proven its potential in solving sequential decision-making problems under uncertainties, such as finding paths to route data packets in wireless sensor networks. With reinforcement learning, the computation of the optimum path requires careful definition of the so-called reward function, which is defined as a linear function that aggregates multiple objective functions into a single objective to compute a numerical value (reward) to be maximized. In a typical defined linear reward function, the multiple objectives to be optimized are integrated in the form of a weighted sum with fixed weighting factors for all learning agents. This study proposes a reinforcement learning -based routing protocol for wireless sensor network, where different learning agents prioritize different objective goals by assigning weighting factors to the aggregated objectives of the reward function. We assign appropriate weighting factors to the objectives in the reward function of a sensor node according to its hop-count distance to the sink node. We expect this approach to enhance the effectiveness of multi-objective reinforcement learning for wireless sensor networks with a balanced trade-off among competing parameters. Furthermore, we propose SDN (Software Defined Networks) architecture with multiple controllers for constant network monitoring to allow learning agents to adapt according to the dynamics of the network conditions. Simulation results show that our proposed scheme enhances the performance of wireless sensor network under varied conditions, such as the node density and traffic intensity, with a good trade-off among competing performance metrics.

Temporal Difference 학습을 이용한 다중 집단 강화.다양화 상호작용 개미 강화학습 (Multi Colony Intensification.Diversification Interaction Ant Reinforcement Learning Using Temporal Difference Learning)

  • 이승관
    • 한국콘텐츠학회논문지
    • /
    • 제5권5호
    • /
    • pp.1-9
    • /
    • 2005
  • 본 논문에서는 Temporal Difference 학습을 적용한 Ant-Q 기반 개미 모델을 이용한 다중 집단 상호작용 개미 강화학습 모델을 제안한다. 이 모델은 몇 개의 독립적 개미시스템 집단으로 이루어져 있으며, 상호작용은 집단간 엘리트 전략(강화, 다양화 전략)에 따라 임무를 수행한다. 강화 전략은 다른 에이전트 집단의 휴리스틱 정보를 이용해 좋은 경로 선택을 가능하게 한다. 이것은 집단간 긍정적 상호작용을 통해 에이전트들의 방문 빈도가 높은 간선을 선택하게 한다. 다양화 전략은 에이전트들이 다른 에이전트 집단의 탐색 정보에 의해 부정적 상호작용을 수행함으로써 방문 빈도수가 높은 간선의 선택을 회피하게 만든다. 이러한 전략을 통해 제안한 강화학습은 기존의 개미집단시스템, Ant-Q학습보다 최적해에 더 빠르게 수렴할 수 있음을 실험을 통해 알 수 있었다.

  • PDF

용탕단조법으로 제조된 $Al_2O_3/AC4C$ 복합재료의 피로균열 전파거동에 관한 연구 (A Study on the Fatigue Crack Propagation Behavior of $Al_2O_3/AC4C$ Composites Made by Squeeze Casting Process)

  • 여인동;이지환
    • 한국주조공학회지
    • /
    • 제15권4호
    • /
    • pp.388-396
    • /
    • 1995
  • This study has been conducted with the purpose of examining the fatigue crack growth characteristics of $Al_2O_3$ short fiber reinforced aluminum matrix composites made by squeeze casting process with different applied pressure and binder amount. Fatigue crack growth experiments have been performed under constant load amplitude method with a fixed load ratio. The rate of crack propagation was decreased with binder amount as well as applied pressure. Also fatigue crack growth path in matrix was changed from flat to rough mode with an increase of applied pressure. In the composites, fatigue crack was propagated to interface between matrix and reinforcement at 10MPa, but it was propagated to reinforcement at 20MPa. The major reason of thee result was considered that interfacial bonding force and microstructure of matrix were improved due to an increase of applied pressure. Localized ductile striation in the composites was observed at low growth rate region and such a phenominon was remarkable with an increase of applied pressure. At high growth rate region, the propensity of fracture appearance was changed from interfacial debonding to reinforcement fracture with an increase of applied pressure.

  • PDF

심층 강화학습을 이용한 휠-다리 로봇의 3차원 장애물극복 고속 모션 계획 방법 (Fast Motion Planning of Wheel-legged Robot for Crossing 3D Obstacles using Deep Reinforcement Learning)

  • 정순규;원문철
    • 로봇학회논문지
    • /
    • 제18권2호
    • /
    • pp.143-154
    • /
    • 2023
  • In this study, a fast motion planning method for the swing motion of a 6x6 wheel-legged robot to traverse large obstacles and gaps is proposed. The motion planning method presented in the previous paper, which was based on trajectory optimization, took up to tens of seconds and was limited to two-dimensional, structured vertical obstacles and trenches. A deep neural network based on one-dimensional Convolutional Neural Network (CNN) is introduced to generate keyframes, which are then used to represent smooth reference commands for the six leg angles along the robot's path. The network is initially trained using the behavioral cloning method with a dataset gathered from previous simulation results of the trajectory optimization. Its performance is then improved through reinforcement learning, using a one-step REINFORCE algorithm. The trained model has increased the speed of motion planning by up to 820 times and improved the success rates of obstacle crossing under harsh conditions, such as low friction and high roughness.

Autonomous and Asynchronous Triggered Agent Exploratory Path-planning Via a Terrain Clutter-index using Reinforcement Learning

  • Kim, Min-Suk;Kim, Hwankuk
    • Journal of information and communication convergence engineering
    • /
    • 제20권3호
    • /
    • pp.181-188
    • /
    • 2022
  • An intelligent distributed multi-agent system (IDMS) using reinforcement learning (RL) is a challenging and intricate problem in which single or multiple agent(s) aim to achieve their specific goals (sub-goal and final goal), where they move their states in a complex and cluttered environment. The environment provided by the IDMS provides a cumulative optimal reward for each action based on the policy of the learning process. Most actions involve interacting with a given IDMS environment; therefore, it can provide the following elements: a starting agent state, multiple obstacles, agent goals, and a cluttered index. The reward in the environment is also reflected by RL-based agents, in which agents can move randomly or intelligently to reach their respective goals, to improve the agent learning performance. We extend different cases of intelligent multi-agent systems from our previous works: (a) a proposed environment-clutter-based-index for agent sub-goal selection and analysis of its effect, and (b) a newly proposed RL reward scheme based on the environmental clutter-index to identify and analyze the prerequisites and conditions for improving the overall system.

Post-peak response analysis of SFRC columns including spalling and buckling

  • Dhakal, Rajesh P.
    • Structural Engineering and Mechanics
    • /
    • 제22권3호
    • /
    • pp.311-330
    • /
    • 2006
  • Standard compression tests of steel fiber reinforced concrete (SFRC) cylinders are conducted to formulate compressive stress versus compressive strain relationship of SFRC. Axial pullout tests of SFRC specimens are also conducted to explore its tensile stress strain relationship. Cover concrete spalling and reinforcement buckling models developed originally for normal reinforced concrete are modified to extend their application to SFRC. Thus obtained monotonic material models of concrete and reinforcing bars in SFRC members are combined with unloading/reloading loops used in the cyclic models of concrete and reinforcing bars in normal reinforced concrete. The resulting path-dependent cyclic material models are then incorporated in a finite-element based fiber analysis program. The applicability of these models at member level is verified by simulating cyclic lateral loading tests of SFRC columns under constant axial compression. The analysis using the proposed SFRC models yield results that are much closer to the experimental results than the analytical results obtained using the normal reinforced concrete models are.

Generalized evolutionary optimum design of fiber-reinforced tire belt structure

  • Cho, J.R.;Lee, J.H.;Kim, K.W.;Lee, S.B.
    • Steel and Composite Structures
    • /
    • 제15권4호
    • /
    • pp.451-466
    • /
    • 2013
  • This paper deals with the multi-objective optimization of tire reinforcement structures such as the tread belt and the carcass path. The multi-objective functions are defined in terms of the discrete-type design variables and approximated by artificial neutral network, and the sensitivity analyses of these functions are replaced with the iterative genetic evolution. The multi-objective optimization algorithm introduced in this paper is not only highly CPU-time-efficient but it can also be applicable to other multi-objective optimization problems in which the objective function, the design variables and the constraints are not continuous but discrete. Through the illustrative numerical experiments, the fiber-reinforced tire belt structure is optimally tailored. The proposed multi-objective optimization algorithm is not limited to the tire reinforcement structure, but it can be applicable to the generalized multi-objective structural optimization problems in various engineering applications.

2방향 수평력을 받는 고강도 철근콘크리트 기둥의 파괴거동에 관한연구 (Damage Behavior of High Strength Reinforced Concrete Columns under Biaxial Lateral Loading)

  • 박재영
    • 한국콘크리트학회:학술대회논문집
    • /
    • 한국콘크리트학회 2000년도 봄 학술발표회 논문집
    • /
    • pp.411-416
    • /
    • 2000
  • The behavior of high strength reinforced concrete columns subjected to uniaxal reversal loading and biaxial reversal circle path loading was investigated. Four full scale test specimens were tested. All specimens were adopted cantilever type, in order that the critical region is to locate only at the bottom of column. The parameters studied were transverse reinforcement ratio, uniaxial lateral loading and biaxial lateral loading. The damage features of columns by the biaxial loading are different from those of the uniaxial loading, However, the maximum strength and the draft angle at maximum strength were almost the same under uniaxial and biaxial loading. The transverse reinforcement under biaxial loading was very effective for increasing ductility of specimens.

  • PDF