Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions

  • Lee, Jong-Min (School of Chemical and Biomolecular Engineering, Georgia Institute of Technology) ;
  • Lee, Jay H. (School of Chemical and Biomolecular Engineering, Georgia Institute of Technology)
  • Published : 2004.09.01

Abstract

This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-Dynamic Programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques fur dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

Keywords

References

  1. Electric Power Systems Research v.63 no.1 A reinforcement learning approach to automatic generation control T. P. I. Ahamed;P. S. N. Rao;P. S. Sastry https://doi.org/10.1016/S0378-7796(02)00088-3
  2. Journal of Dynamic Systems, Measurement and Control Data storage in the cerebellar model articulation controller J. S. Albus
  3. Journal of Dynamic Systems, Measurement and Control A new approach to manipulator control: The cerebellar model articulation controler (CMAC) J. S. Albus
  4. IEEE Control Systems Magazine v.9 no.3 Learning to control an inverted pendulum using neural networks C. W. Anderson https://doi.org/10.1109/37.24809
  5. Artificial Intelligence in Engineering v.11 no.4 Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil C. W. Anderson;D. C. Hittle;A. D. Katz;R. M. Kretchmar https://doi.org/10.1016/S0954-1810(97)00004-6
  6. Machine Learning v.23 Purposive behavior acquisition for a real robot by vision-based reinforcement learning M. Asada;S. Noda;S. Tawaratsumida;K. Hosoda
  7. Comp. & Maths. with Appls. v.12A Dual control of an integrator with unknown gain K. J. Astrom;A. Helmersson
  8. Proc. of the Fourteenth International Conference on Machine Learning Robot learning from demonstration C. G. Atkeson;S. Schaal
  9. Proc. of the International Conference on Machine Learning Residual algorithms: Reinforcement learning with function approximation L. Baird III
  10. Artificial Intelligence v.72 no.1 Learning to act using real-time dynamic programming A. G. Barto;S. J. Bradtke;S. P. Singh https://doi.org/10.1016/0004-3702(94)00011-O
  11. IEEE Trans. on Systems, Man, and Cybernetics v.13 no.5 Neuronlike adaptive elements that can solve difficult learning control problems A. G. Barto;R. S. Sutton;C. W. Anderson
  12. Dynamic Programming R. E. Bellman
  13. Dynamic Programming and Optimal Control(2nd edition) D. P. Bertsekas
  14. Proc. of Sixth International Conference on Chemical Process Control Neuro-dynamic programming: An overview D. P. Bertsekas;J. B. Rawlings(ed.);B. A. Ogunnaike(ed.);J. W. Eaton(ed.)
  15. IEEE Trans. on Automatic Control v.34 no.6 Adaptive aggregation for infinite horizon dynamic programming D. P. Bersekas;D. A. Castanon https://doi.org/10.1109/9.24227
  16. Data Networks(2nd edition) D. P. Bertsekas;R. G. Gallager
  17. Parallel and Distributed Computation: Numerical Methods D. P. Bertsekas;J. N. Tsitsiklis
  18. Neuro-Dynamic Programming D. P. Bertsekas;J. N. Tsitsiklis
  19. Probability Theory and related Fields v.78 A convex analytic approach to Markov decision processes V. Borkar https://doi.org/10.1007/BF00353877
  20. Advances in Neural Information Processing Systems v.7 Generalization in reinforcement learning: safely approximating J. A. Boyan;A. W. Moore;G. Tesauro(ed.);D. Touretzky(ed.)
  21. Advances in Neural Information Processing Systems v.8 Improving elevator performance using reinforcement learning R. Crites:A. G. Barto;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)
  22. Machine Learning v.33 Elevator group control using multiple reinforcement learning agents R. Crites;A. G. Barto https://doi.org/10.1023/A:1007518724497
  23. Advances in Neural Information Processing Systems v.5 Reinforcement learning applied to linear quadratic regulation S. J. Bradtke;S. J. Hanson(ed.);J. Cowan(ed.);C. L. Giles(ed.)
  24. Machine Learning v.8 The convergence of $TD({\lambda})$ for general ${\lambda}$ P. Dayan
  25. Operations Research v.51 no.6 The linear programming approach to approximate dynamic programming D. P. de Farias;B. Van Roy https://doi.org/10.1287/opre.51.6.850.24925
  26. Management Science v.16 On linear programming in a Markov decision problem E. V. Denardo
  27. Proc. of the International Conference on Robotics and Automation A comparison of direct and model-based reinforcement learning C. G. Atkeson;J. Santamaria
  28. Proc. of the Twelfh International Conference on Machine Learning Stable function approximation in dynamic programming G. J. Gordon
  29. The Elements of Statistical Learning: Data Mining, Inference, and Prediction T. Hastie;R. Tibshirani;J. Friedman
  30. Management Science v.25 Linear programming and Markov decision chains A. Hordijk;L. C. M. Kallenberg https://doi.org/10.1287/mnsc.25.4.352
  31. Computers & Chemical Engineering v.16 no.4 Process control via artificial neural networks and reinforcement learning J. C. Hoskins;D. M. Himmelblau https://doi.org/10.1016/0098-1354(92)80045-B
  32. Dynamic Programming and Markov Processes R. A. Howard
  33. Neural Computation v.6 no.6 On the convergence of stochastic iterative dynamic programming algorithms T. Jaakkola;M. I. Jordan;S. P. Singh https://doi.org/10.1162/neco.1994.6.6.1185
  34. Journal of Artificial Intelligence Research v.4 Reinforcement learning: A survey L. P. Kaelbling;M. L. Littman;A. W. Moore
  35. International Journal of Robust and Nonlinear Control v.13 no.3;4 Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor N. S. Kaisare;J. M. Lee;J. H. Lee
  36. Proc. of the Eleventh National Conference on Artificial Intelligence Complexity analysis of real-time reinforcement learning S. Koenig;R. G. Simmons
  37. Advances in neural information processing systems v.12 Actor-critic algorithms V. R. Konda;J. N. Tsitsiklis;S. A. Solla(ed.);T. K. Leen(ed.);L.-R.Muller(ed.)
  38. Proc. of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems Adaptive state space quantisation for reinforcement learning of collision-free navigation B. J. A. Krose;J. W. M. van Dam
  39. Stochastic Systems: Estimation, Identification and Adaptive Control P. R. Kumar;P. P. Varaiya
  40. AIChE Annual Meeting Simulation-based dynamic programming strategy for improvement of control policies J. M. Lee;N. S. Kaisare;J. H. Lee
  41. AIChE Annual Meeting Neuro-dynamic programming approach to dual control problem J. M. Lee;J. H. Lee
  42. Automatica Approximate dynamic programming based approaches for input-ouput data-driven control of nonlinear processes J. M. Lee;J. H. Lee
  43. Korean J. Chem. Eng. v.21 no.2 Simulation-based learning of cost-to-go for control of nonlinear processes J. M. Lee;J. H. Lee https://doi.org/10.1007/BF02705417
  44. Computers & Chemical Engineering v.16 A neural network architecture that computes its own reliability J. A. Leonard;M. A. Kramer;L. H. Ungar https://doi.org/10.1016/0098-1354(92)80035-8
  45. Machine Learning v.8 Self-improving reactive agents based on reinforcement learning, plannin and teaching L.-J. Lin
  46. Machine Learning v.55 no.2;3 Automatic programming of behavior-based robots using rein-forcement learning S. Mahadevan;J. Connell
  47. Proc. of 14th International Conference on Machine Learning Self-improving factory simulation using continuous-time average-reward reinforcement S. Mahadevan;N. Marchalleck;T. K. Das;A. Gosavi
  48. Management Science v.6 no.3 Linear programming and sequential decisions A. S. Manne https://doi.org/10.1287/mnsc.6.3.259
  49. IEEE Trans. on Automatic Control v.46 no.2 Simulationbased optimization of Markov reward processes P. Marbach;J. N. Tsitsiklis https://doi.org/10.1109/9.905687
  50. Computers & Chemical Engineering v.24 Batch process modeling for optimization using reinforcement learning E. C. Martinez https://doi.org/10.1016/S0098-1354(00)00354-9
  51. Applications of Artificial Neural Networks Temporal difference learning: A chemical process control application S. Miller;R. J. Williams;A. F. Murray(ed.)
  52. Machine Learning v.21 no.3 The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces A. Moore;C. Atkeson
  53. PhD thesis, Cambridge University Efficient Memory Based Robot Learning A. W. Moore
  54. Machine Learning v.13 Prioritized sweeping: Reinforcement learning with less data and less time A. W. Moore;C. G. Atkeson
  55. Computers & Chemical Engineering v.23 Model predictive control: Past, present and future M. Morari;J. H. Lee https://doi.org/10.1016/S0098-1354(98)00301-9
  56. Proc. of the International Joint Conference on Artificial Intelligence A convergent reinforcement learning algorithm in the comtinuous case based on a finite difference method R. Munos
  57. Machine Learning Journal v.40 A study of reinforcemetn learning in the continuous case by means of viscosity solutions R. Munos https://doi.org/10.1023/A:1007686309208
  58. Advances in Neural Information Processing Systems v.10 Enhancing Q-learning for optimal asset allocation R. Neuneier;M. Jordan(ed.);M. Kearns(ed.);S. Solla(ed.)
  59. IEEE Trans. on automatic Control v.47 no.10 Kernel-based reinforcement learning in average-cost problems D. Ormoneit;P. W. Glynn https://doi.org/10.1109/TAC.2002.803530
  60. Machine Learning v.49 Kernel-based reinforcement learning D. Ormoneit;S. Sen https://doi.org/10.1023/A:1017928328829
  61. Ann. Math. Statist. v.33 On estimation of a probability density function and mode E. Parzen https://doi.org/10.1214/aoms/1177704472
  62. PhD thesis, North-eastern University Efficient Dynamic Programming-Based Learning for Control J. Peng
  63. Adaptive Behavior v.1 no.4 Efficient learning and planning within the Dyna framework J. Peng;R. J. Williams https://doi.org/10.1177/105971239300100403
  64. IEEE Trans. on Neural Networks v.8 no.5 Adaptive critic designs D. V. Prokhorov;D. C. Wunsch II https://doi.org/10.1109/72.623201
  65. Markov Decision Processes M. L. Puterman
  66. Control Engineering Practice v.11 no.7 A survey of industrial model predictive control technology S. J. Qin;T. A. Badgwell https://doi.org/10.1016/S0967-0661(02)00186-7
  67. Mathematical and Computational Techniques for Multilevel Adaptive Methods U. Rude
  68. Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University On-line Q-learning using connectionist systems G. A. Rummery;M. Niranjan
  69. Proc. of the Fourth Connectionist Models Summer School Approximating Q-values with basis function representations P. Sabes
  70. IBM J. Res. Develop. Some studies in machine learning using the game of checkers A. L. Samuel
  71. IBM J. Res. Develop. Some studies in machine learning using the game of checkers II - recent progress A. L. Samuel
  72. Adaptive Behavior v.6 no.2 Experiments with reinforcement learning in problems with continuous state and action spaces J. C. Santamaria;R. S. Sutton;A. Ram https://doi.org/10.1177/105971239700600201
  73. Advances in Neural Information Processing Systems v.9 Learning from demonstration S. Schaal;M. C. Mozer(ed.);M. Jordan(ed.);T. Petsche(ed.)
  74. IEEE Control Systems v.14 no.1 Robot juggling: an implementation of memory-based learning S. Schaal;C. Atkeson https://doi.org/10.1109/37.257895
  75. Advances in Neural Information Processing Systems v.6 Temporal difference learning of position evaluation in the game of Go N. N. Schraudolph;P. Dayan;T. J. Sejnowski;J. D. Cowan(ed.);G. Tesauro(ed.);J. Alspector(ed.)
  76. Advances in Neural Information Processing Systems v.9 Reinforcement learning for dynamic chanel allocation in cellular telephone systems S. Singh;D. Bertsekas;M. C. Mozer(ed.);M. I. Jordan(ed.);T. Petsche(ed.)
  77. Machine Learning v.22 Reinforcement learning with replacing eligibility traces S. P. Singh;R. S. Sutton
  78. Proc. 17th International Conf. on Machine Learning Practical reinforcement learning in continuous spaces W. D. Smart;L. P. Kaelbling
  79. Advances in Neural Information Processing Systems v.12 Policy gradient methods for reinforce-ment learning with function approximation R. Sutton;D. McAllester;S. Singh;Y. Mansour;S. A. Solla(ed.);T. K. Leen(ed.);K.-R. Muller(ed.)
  80. PhD thesis, University of Massachusetts Temporal Credit Assignment in Reinforcement Learning R. S. Sutton
  81. Machine Learning v.3 no.1 Learning to predict by the method of temporal differences R. S. Sutton
  82. Proc. of the Seventh International Conference on Machine Learning Integrated architectures for learning, planning, and reacting based on approximating dynamic programming R. S. Sutton
  83. Proc. of the Eighth International Workshop on Machine Learning Planning by incremental dynamic programming R. S. Sutton
  84. Advances in Neural Information Processing Systems v.8 Generalization in reinforcement learning: Successful examples using sparse coarse coding R. S. Sutton;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)
  85. Psycol. Rev. v.88 no.2 Toward a modern theory of adaptive networks: Expectation and prediction R. S. Sutton;A. G. Barto https://doi.org/10.1037/0033-295X.88.2.135
  86. Reinforcement Learning: An Introduction R. S. Sutton;A. G. Barto
  87. Advanced Robotics v.14 no.5 Enhanced continuous valued Q-learning for real autonomous robots M. Takeda;T. Nakamura;M. Imai;T. Ogasawara;M. Asada https://doi.org/10.1163/156855300741852
  88. Machine Learning v.8 Practical issues in temporal difference learning G. Tesauro
  89. Neural Computation v.6 no.2 TD-Gammon, a self-teaching backgammon program, achieves master-level play G. Tesauro https://doi.org/10.1162/neco.1994.6.2.215
  90. Communications of the ACM v.38 no.3 Temporal difference learning and TD-Gammon G. Tesauro https://doi.org/10.1145/203330.203343
  91. Advances in Neural Information Processing Systems v.7 Learning to play the game of chess S. Thrun;G. Tesauro(ed.);D. S. Touretzky(ed.);T. K. Leen(ed.)
  92. Proc. of the Fourth Connectionist Models Summer School Issues in using function approximation for reinforcement learning S. Thrun;A. Schwartz
  93. Machine Learning v.16 Asynchronous stochastic approximation and Q-learning J. N. Tsitsiklis
  94. IEEE Trans. on Automatic Control v.42 no.5 An analysis of temporal-difference learning with function approximation J. N. Tsitsiklis;B. Van Roy https://doi.org/10.1109/9.580874
  95. Handbook of Markov Decision Processes: Methods and Applications Neuro-dynamic programming: Overview and recent trends B. Van Roy;E. Feinberg(ed.);A. Shwartz(ed.)
  96. PhD thesis, University of Cambridge Learning from Delayed Rewards C. J. C. H. Watkins
  97. Machine Learning v.8 Q-learning C. J. C. H. Watkins;P. Dayan
  98. General Systems Yearbook v.22 Advanced forecasting methods for global crisis warning and models of intelligence P. J. Werbos
  99. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold Approximate dynamic programming for real-time control and neural modeling P. J. Werbos;A. White(ed.);D. A. Sofge(ed.)
  100. Proc. of the Eighth International Workshop on Machine Learning Complexity and cooperation in Q-learning S. D. Whitehead
  101. Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems R. J. Williams;L. C. Baird III
  102. Computers & Chemical Engineering v.21S Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation J. A. Wilson;E. C. Martinez
  103. Stochastic Problems in Control Stochastic control problems M. Wonham;B. Friedland(ed.)
  104. PhD thesis, Oregon State University;Technical Report CS-96-30-1 Reinforcement Learning for Job-Shop Scheduling W. Zhang
  105. Proc. of the Twelfth International Conference on Machine Learning A reinforcement learning approach to job-shop scheduling W. Zhang;T. G. Dietterich
  106. Advances in Neural Information Processing Systems v.8 Highperformance jop-shop scheduling with a time delay $TD({\lambda})$ network W. Zhang;T. G. Dietterich;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)