강화학습기법을 이용한 TSP의 해법

A Learning based Algorithm for Traveling Salesman Problem

  • 임준묵 (한밭대학교 산업경영공학과) ;
  • 배성민 (한밭대학교 산업경영공학과) ;
  • 서재준 (한밭대학교 산업경영공학과)
  • Lim, JoonMook (Department of Industrial and Management Engineering, Hanbat National University) ;
  • Bae, SungMin (Department of Industrial and Management Engineering, Hanbat National University) ;
  • Suh, JaeJoon (Department of Industrial and Management Engineering, Hanbat National University)
  • 발행 : 2006.03.31

초록

This paper deals with traveling salesman problem(TSP) with the stochastic travel time. Practically, the travel time between demand points changes according to day and time zone because of traffic interference and jam. Since the almost pervious studies focus on TSP with the deterministic travel time, it is difficult to apply those results to logistics problem directly. But many logistics problems are strongly related with stochastic situation such as stochastic travel time. We need to develop the efficient solution method for the TSP with stochastic travel time. From the previous researches, we know that Q-learning technique gives us to deal with stochastic environment and neural network also enables us to calculate the Q-value of Q-learning algorithm. In this paper, we suggest an algorithm for TSP with the stochastic travel time integrating Q-learning and neural network. And we evaluate the validity of the algorithm through computational experiments. From the simulation results, we conclude that a new route obtained from the suggested algorithm gives relatively more reliable travel time in the logistics situation with stochastic travel time.

키워드

참고문헌

  1. E. P. C. (1978), A Preference Order Dynamic Program for a Stochastic Traveling Salesman Problem, Operations Research, 26(6), 1033-1045 https://doi.org/10.1287/opre.26.6.1033
  2. Freeman, J. A. and Skapura, D. M.(1992), Neural Networks (Algorithms, Applications, Programming and Techniques), Addison Wesley, USA
  3. Gambardella, L. M., and Dorigo, M. (1995), Ant-Q: A Reinforcement Learning approach to the traveling salesman problem, Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufman, San Francisco, CA, 252-260
  4. Gendreau, M., Laporte, G. and Seguin, R. (1996), Stochastic vehicle routing, European Journal of Operational Research, 88, 3-12 https://doi.org/10.1016/0377-2217(95)00050-X
  5. Hagiwara, M. (1994), Neuro.Fuzzy.Genetic Algorithm, Sangyouzusho, Tokyo, Japan
  6. Kaelbling, L. P., Littman, M. L. and Moore, A. W. (1996), Reinforcement Learning: A Survey, Journal of Artificial Intelligence Research, 4
  7. Kim, D. S.(1992), Neural Networks(Theory and Applications), HightechInfo, Seoul, Korea
  8. Lambert, V., Laporte, G. and Louveaux, F. (1993), Designing collection routes through bank branches, Computers and Operations Research, 20, 783-791 https://doi.org/10.1016/0305-0548(93)90064-P
  9. Laporte, G., Louveaux, F. and Mercure, H. (1992), The vehicle routing problem with stochasitc travel times, Transportation Science, 26(3), 161-170 https://doi.org/10.1287/trsc.26.3.161
  10. Lee, K. M. (1999), Sequencing Delivery and Receiving Operations for Transfer Cranes, MS Thesis, Pusan National University, Korea
  11. Leipala, T. (1978), On the solutions of stochastic traveling salesman problems, European Journal of Operational Research, 2, 291-297 https://doi.org/10.1016/0377-2217(78)90044-9
  12. Lin, F. and Pai, Y. H. (2000), Using Multi-Agent Simulation and Learning to Design New Business Processes, IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 30(3), 380-384 https://doi.org/10.1109/3468.844361
  13. Lin, L. J. (1993), Reinforcement Learning for Robots Using Neural Networks, Ph.D Dissertation, Carnegie Mellon University
  14. Sniedovich, M. (1981), Analysis of a preference order traveling salesman problem, Operations Research, 29, 1234-1237 https://doi.org/10.1287/opre.29.6.1234
  15. Sutton, R. S. and Barto, A. G. (1998), Reinforcement Learning: An Introduction, The MIT Press
  16. Touzet, C. F. (1997), Neural reinforcement learning for behaviour synthesis, Robotics and Autonomous Systems, 22, 251-281 https://doi.org/10.1016/S0921-8890(97)00042-0