• 제목/요약/키워드: Reinforcement value

검색결과 461건 처리시간 0.02초

목표상태 값 전파를 이용한 강화 학습 (Reinforcement Learning using Propagation of Goal-State-Value)

  • 김병천;윤병주
    • 한국정보처리학회논문지
    • /
    • 제6권5호
    • /
    • pp.1303-1311
    • /
    • 1999
  • In order to learn in dynamic environments, reinforcement learning algorithms like Q-learning, TD(0)-learning, TD(λ)-learning have been proposed. however, most of them have a drawback of very slow learning because the reinforcement value is given when they reach their goal state. In this thesis, we have proposed a reinforcement learning method that can approximate fast to the goal state in maze environments. The proposed reinforcement learning method is separated into global learning and local learning, and then it executes learning. Global learning is a learning that uses the replacing eligibility trace method to search the goal state. In local learning, it propagates the goal state value that has been searched through global learning to neighboring sates, and then searches goal state in neighboring states. we can show through experiments that the reinforcement learning method proposed in this thesis can find out an optimal solution faster than other reinforcement learning methods like Q-learning, TD(o)learning and TD(λ)-learning.

  • PDF

Comparison of value-based Reinforcement Learning Algorithms in Cart-Pole Environment

  • Byeong-Chan Han;Ho-Chan Kim;Min-Jae Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권3호
    • /
    • pp.166-175
    • /
    • 2023
  • Reinforcement learning can be applied to a wide variety of problems. However, the fundamental limitation of reinforcement learning is that it is difficult to derive an answer within a given time because the problems in the real world are too complex. Then, with the development of neural network technology, research on deep reinforcement learning that combines deep learning with reinforcement learning is receiving lots of attention. In this paper, two types of neural networks are combined with reinforcement learning and their characteristics were compared and analyzed with existing value-based reinforcement learning algorithms. Two types of neural networks are FNN and CNN, and existing reinforcement learning algorithms are SARSA and Q-learning.

The investigation of pH threshold value on the corrosion of steel reinforcement in concrete

  • Pu, Qi;Yao, Yan;Wang, Ling;Shi, Xingxiang;Luo, Jingjing;Xie, Yifei
    • Computers and Concrete
    • /
    • 제19권3호
    • /
    • pp.257-262
    • /
    • 2017
  • The aim of this study is to investigate the pH threshold value for the corrosion of steel reinforcement in concrete. A method was designed to attain the pH value of the pore solution on the location of the steel in concrete. Then the pH values of the pore solution on the location of steel in concrete were changed by exposing the samples to the environment (CO25%, RH 40%) to accelerate carbonation with different periods. Based on this, the pH threshold value for the corrosion of steel reinforcement had been examined by the methods of half-cell potential and electrochemical impedance spectra (EIS). The results have indicated that the pH threshold value for the initial corrosion of steel reinforcement in concrete was 11.21. However, in the carbonated concrete, agreement among whether steel corrosion was initiatory determined by the detection methods mentioned above could be found.

전단 보강재의 보강길이에 따른 기초판의 뚫림전단 성능평가 (Punching Shear Performance Evaluation of Foundation by Enforcement-length of Shear Head Reinforcement)

  • 이용재;이원호;양원직
    • 한국구조물진단유지관리공학회 논문집
    • /
    • 제21권2호
    • /
    • pp.60-68
    • /
    • 2017
  • 본 연구에서는 지내력이 기초판에 미치는 영향을 충분히 고려할 수 있도록 현장여건과 동일한 옥외의 지반에서 실험할 수 있는 시스템을 구축하였으며, 대상 실험체는 경제성 및 시공성 향상을 위하여 강판을 "ㄷ"자형으로 절곡하여 단면 2차모멘트를 극대화 하고 현장조립이 가능하도록 제안 하였다. 대상 실험체는 무보강 실험체 1개, 강판 두께를 동일하게 하여 보강 길이를 달리한 실험체 3개, 강판 두께를 달리하고 위험단면 부근에 스티프너 보강한 실험체 2개 총 6개의 실험체를 대상으로 비교 검토 한다. 실험 결과 스티프너 보강에 의한 효과는 없는 것으로 나타났으며, 전단보강재의 보강길이는 확장된 위험단면에서 전단력을 지내력으로 나타낸 값과 위험단면에서 보강재가 받을 수 있는 전단내력을 지내력으로 환산여하여 두 선의 교차점을 유효보강 길이로 산정하는 강판두께별 유효보강길이 산정방법을 제안하였다.

Bi-2212 초전도 테이프에서 임계전류의 응력/변형률 특성에 미쳐는 외부강화의 영향 (Effect of External Reinforcement on Stress/strain Characteristics of Critical Current in Ag Alloy Sheathed Bi-2212 Superconducting Tapes)

  • 신형섭
    • 한국초전도ㆍ저온공학회논문지
    • /
    • 제3권1호
    • /
    • pp.6-10
    • /
    • 2001
  • Stress/stram dependencies of the critical current $I_c$ in AgMgNi sheathed multifilamentary Bi(2212) superconducting tapes were evaluated at 77K, 0T. The external reinforcement was accomplished by soldering Ag-Mg tapes to sin91e side or both sides of the sample. With the external reinforcement. the strength of tapes increased but $I_c$, decreased The $I_c$, degradation characteristic according to the external reinforcement was improved markedly in terms of the stress although it appeared less rectal.table on the basis of the strain. Effects of external reinforcement were discussed in a viewpoint of monitoring sensitivity of cracking in superconducting filaments by considering n-value representing the transport behavior of the current. It is closely associated with the location of them relative to the voltage-monitoring region in the tape.

  • PDF

인장철근배근량에 따른 U-플랜지 트러스 복합보의 휨 내력에 관한 실험연구 (Experimental Study on the Flexural Capacity of the U-Flanged Truss Hybrid Beam According to Reinforcement Amounts)

  • 오명호;박성진;김영호
    • 한국공간구조학회논문집
    • /
    • 제21권2호
    • /
    • pp.33-40
    • /
    • 2021
  • For the practical application of U-flanged Truss Hybrid beams, the flexural capacity of hybrid beams with end reinforcement details using vertical steel plates was verified. The bending test of U-flanged Truss Hybrid beams was performed using the same top chord under the compressive force, but with the thickness of the bottom plate and the amount of tensile reinforcement. The initial stiffness and maximum load of the specimen with tensile reinforcement have a higher value than that of the specimen without tension reinforcement, but the more tensile reinforcement, the greater the load decrease after the maximum load. In the case of the specimen with tensile reinforcement, because the test result value is 76% to 88% when compared with the flexural strength according to Korea Design Code, the safety of the U-flanged Truss Hybrid beam with the same details of the specimens can't ensure. Therefore, the development of new details is required to ensure that the bottom steel plate and the tensile reinforcement can undergo sufficient tensile deformation.

Study on fracture characteristics of reinforced concrete wedge splitting tests

  • HU, Shaowei;XU, Aiqing;HU, Xin;YIN, Yangyang
    • Computers and Concrete
    • /
    • 제18권3호
    • /
    • pp.337-354
    • /
    • 2016
  • To study the influence on fracture properties of reinforced concrete wedge splitting test specimens by the addition of reinforcement, and the restriction of steel bars on crack propagation, 7 groups reinforced concrete specimens of different reinforcement position and 1 group plain concrete specimens with the same size factors were designed and constructed for the tests. Based on the double-K fracture criterion and tests, fracture toughness calculation model which was suitable for reinforced concrete wedge splitting tensile specimens has been obtained. The results show that: the value of initial craking load Pini and unstable fracture load Pun decreases gradually with the distance of reinforcement away from specimens's top. Compared with plain concrete specimens, addition of steel bar can reduce the value of initial fracture toughness KIini, but significantly increase the value of the critical effective crack length ac and unstable fracture toughness KIun. For tensional concrete member, the effect of anti-cracking by reinforcement was mainly acted after cracking, the best function of preventing fracture initiation was when the steel bar was placed in the middle of the crack, and when the reinforcement was across the crack and located away from crack tip, it plays the best role in inhibiting the extension of crack.

Q-value Initialization을 이용한 Reinforcement Learning Speedup Method (Reinforcement learning Speedup method using Q-value Initialization)

  • 최정환
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(3)
    • /
    • pp.13-16
    • /
    • 2001
  • In reinforcement teaming, Q-learning converges quite slowly to a good policy. Its because searching for the goal state takes very long time in a large stochastic domain. So I propose the speedup method using the Q-value initialization for model-free reinforcement learning. In the speedup method, it learns a naive model of a domain and makes boundaries around the goal state. By using these boundaries, it assigns the initial Q-values to the state-action pairs and does Q-learning with the initial Q-values. The initial Q-values guide the agent to the goal state in the early states of learning, so that Q-teaming updates Q-values efficiently. Therefore it saves exploration time to search for the goal state and has better performance than Q-learning. 1 present Speedup Q-learning algorithm to implement the speedup method. This algorithm is evaluated. in a grid-world domain and compared to Q-teaming.

  • PDF

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

기초지반의 지지력보강공법에 관한 연구 (Studies on the Development of Bearing Capacity Reinforcement for the Foundation of Soil)

  • 유동환;최예환;유연택
    • 한국농공학회지
    • /
    • 제30권1호
    • /
    • pp.38-49
    • /
    • 1988
  • This paper presented as follows results of laboratory model tests with various shaped footings on soil bed reinforced with the strips on the base of behaviour of soil structure according to the loads and triaxial test results reinforced with geotextiles. Their parameters studied were the effects on the bearing capacity of a footing of the first layer of reinforcement, horizontal and vertical spacing of layers, number of layers, tensile strength of reinforcement and iclination load to the vertical 1.Depending on the strip arrangement, ultimate bearing capacity values could be more improved than urreinforced soil and the failure of soil was that the soil structure was transfered from the macrospace to microspase and its arrangement, from edge to edge to face to face. 2.The reinforcement was produced the reinforcing effects due to controlling the value of factor of one and permeable reinforcement was never a barrier of drainage condition. 3.Strength ratio was decreased as a linear shape according to increment of saturation degree of soil used even though at the lower strength ratio, the value of M-factor was rot influenced on the strength ratio but impermeable reinforcement decreased the strength of bearing capacity. 4.Ultimate bearing capacity under the plane-strain condition was appeared a little larger than triaxial or the other theoretical formulars and the circular footing more effective. 5.The maximum reinforcing effects were obtained at U I B=o.5, B / B=3 and N=3, when over that limit only acting as a anchor, and same strength of fabric appeared larger reinforcing effects compared to the thinner one. 6.As the LDR increased, more and more BCR occurred and there was appeared a block action below Z / B=O.5, but over the value, decrement of BCR was shown linear relation, and no effects above one. 7.The coefficient of the inclination was shown of minimum at the three layers of fabrics, but the value of H / B related to the ultimate load was decreased as increment of inclination degree, even though over the value of 4.5 there wasn't expected to the reinforcing effects As a consequence of the effects on load inclination, the degree of inclination of 15 per cent was decreased the bearing capacity of 70 per cent but irnproved the effects of 45 per cent through the insertion of geotextile.

  • PDF