Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions

Lee, Jong-Min;Lee, Jay H.;

International Journal of Control, Automation, and Systems

Volume 2 Issue 3
/
Pages.263-278
/
2004
/
1598-6446(pISSN)
/
2005-4092(eISSN)

Institute of Control, Robotics and Systems (제어로봇시스템학회)

Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions

Lee, Jong-Min (School of Chemical and Biomolecular Engineering, Georgia Institute of Technology) ;
Lee, Jay H. (School of Chemical and Biomolecular Engineering, Georgia Institute of Technology)

Published : 2004.09.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-Dynamic Programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques fur dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

Keywords

References

Electric Power Systems Research v.63 no.1 A reinforcement learning approach to automatic generation control T. P. I. Ahamed;P. S. N. Rao;P. S. Sastry https://doi.org/10.1016/S0378-7796(02)00088-3
Journal of Dynamic Systems, Measurement and Control Data storage in the cerebellar model articulation controller J. S. Albus
Journal of Dynamic Systems, Measurement and Control A new approach to manipulator control: The cerebellar model articulation controler (CMAC) J. S. Albus
IEEE Control Systems Magazine v.9 no.3 Learning to control an inverted pendulum using neural networks C. W. Anderson https://doi.org/10.1109/37.24809
Artificial Intelligence in Engineering v.11 no.4 Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil C. W. Anderson;D. C. Hittle;A. D. Katz;R. M. Kretchmar https://doi.org/10.1016/S0954-1810(97)00004-6
Machine Learning v.23 Purposive behavior acquisition for a real robot by vision-based reinforcement learning M. Asada;S. Noda;S. Tawaratsumida;K. Hosoda
Comp. & Maths. with Appls. v.12A Dual control of an integrator with unknown gain K. J. Astrom;A. Helmersson
Proc. of the Fourteenth International Conference on Machine Learning Robot learning from demonstration C. G. Atkeson;S. Schaal
Proc. of the International Conference on Machine Learning Residual algorithms: Reinforcement learning with function approximation L. Baird III
Artificial Intelligence v.72 no.1 Learning to act using real-time dynamic programming A. G. Barto;S. J. Bradtke;S. P. Singh https://doi.org/10.1016/0004-3702(94)00011-O
IEEE Trans. on Systems, Man, and Cybernetics v.13 no.5 Neuronlike adaptive elements that can solve difficult learning control problems A. G. Barto;R. S. Sutton;C. W. Anderson
Dynamic Programming R. E. Bellman
Dynamic Programming and Optimal Control(2nd edition) D. P. Bertsekas
Proc. of Sixth International Conference on Chemical Process Control Neuro-dynamic programming: An overview D. P. Bertsekas;J. B. Rawlings(ed.);B. A. Ogunnaike(ed.);J. W. Eaton(ed.)
IEEE Trans. on Automatic Control v.34 no.6 Adaptive aggregation for infinite horizon dynamic programming D. P. Bersekas;D. A. Castanon https://doi.org/10.1109/9.24227
Data Networks(2nd edition) D. P. Bertsekas;R. G. Gallager
Parallel and Distributed Computation: Numerical Methods D. P. Bertsekas;J. N. Tsitsiklis
Neuro-Dynamic Programming D. P. Bertsekas;J. N. Tsitsiklis
Probability Theory and related Fields v.78 A convex analytic approach to Markov decision processes V. Borkar https://doi.org/10.1007/BF00353877
Advances in Neural Information Processing Systems v.7 Generalization in reinforcement learning: safely approximating J. A. Boyan;A. W. Moore;G. Tesauro(ed.);D. Touretzky(ed.)
Advances in Neural Information Processing Systems v.8 Improving elevator performance using reinforcement learning R. Crites:A. G. Barto;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)
Machine Learning v.33 Elevator group control using multiple reinforcement learning agents R. Crites;A. G. Barto https://doi.org/10.1023/A:1007518724497
Advances in Neural Information Processing Systems v.5 Reinforcement learning applied to linear quadratic regulation S. J. Bradtke;S. J. Hanson(ed.);J. Cowan(ed.);C. L. Giles(ed.)
Machine Learning v.8 The convergence of $TD({\lambda})$ for general ${\lambda}$ P. Dayan
Operations Research v.51 no.6 The linear programming approach to approximate dynamic programming D. P. de Farias;B. Van Roy https://doi.org/10.1287/opre.51.6.850.24925
Management Science v.16 On linear programming in a Markov decision problem E. V. Denardo
Proc. of the International Conference on Robotics and Automation A comparison of direct and model-based reinforcement learning C. G. Atkeson;J. Santamaria
Proc. of the Twelfh International Conference on Machine Learning Stable function approximation in dynamic programming G. J. Gordon
The Elements of Statistical Learning: Data Mining, Inference, and Prediction T. Hastie;R. Tibshirani;J. Friedman
Management Science v.25 Linear programming and Markov decision chains A. Hordijk;L. C. M. Kallenberg https://doi.org/10.1287/mnsc.25.4.352
Computers & Chemical Engineering v.16 no.4 Process control via artificial neural networks and reinforcement learning J. C. Hoskins;D. M. Himmelblau https://doi.org/10.1016/0098-1354(92)80045-B
Dynamic Programming and Markov Processes R. A. Howard
Neural Computation v.6 no.6 On the convergence of stochastic iterative dynamic programming algorithms T. Jaakkola;M. I. Jordan;S. P. Singh https://doi.org/10.1162/neco.1994.6.6.1185
Journal of Artificial Intelligence Research v.4 Reinforcement learning: A survey L. P. Kaelbling;M. L. Littman;A. W. Moore
International Journal of Robust and Nonlinear Control v.13 no.3;4 Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor N. S. Kaisare;J. M. Lee;J. H. Lee
Proc. of the Eleventh National Conference on Artificial Intelligence Complexity analysis of real-time reinforcement learning S. Koenig;R. G. Simmons
Advances in neural information processing systems v.12 Actor-critic algorithms V. R. Konda;J. N. Tsitsiklis;S. A. Solla(ed.);T. K. Leen(ed.);L.-R.Muller(ed.)
Proc. of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems Adaptive state space quantisation for reinforcement learning of collision-free navigation B. J. A. Krose;J. W. M. van Dam
Stochastic Systems: Estimation, Identification and Adaptive Control P. R. Kumar;P. P. Varaiya
AIChE Annual Meeting Simulation-based dynamic programming strategy for improvement of control policies J. M. Lee;N. S. Kaisare;J. H. Lee
AIChE Annual Meeting Neuro-dynamic programming approach to dual control problem J. M. Lee;J. H. Lee
Automatica Approximate dynamic programming based approaches for input-ouput data-driven control of nonlinear processes J. M. Lee;J. H. Lee
Korean J. Chem. Eng. v.21 no.2 Simulation-based learning of cost-to-go for control of nonlinear processes J. M. Lee;J. H. Lee https://doi.org/10.1007/BF02705417
Computers & Chemical Engineering v.16 A neural network architecture that computes its own reliability J. A. Leonard;M. A. Kramer;L. H. Ungar https://doi.org/10.1016/0098-1354(92)80035-8
Machine Learning v.8 Self-improving reactive agents based on reinforcement learning, plannin and teaching L.-J. Lin
Machine Learning v.55 no.2;3 Automatic programming of behavior-based robots using rein-forcement learning S. Mahadevan;J. Connell
Proc. of 14th International Conference on Machine Learning Self-improving factory simulation using continuous-time average-reward reinforcement S. Mahadevan;N. Marchalleck;T. K. Das;A. Gosavi
Management Science v.6 no.3 Linear programming and sequential decisions A. S. Manne https://doi.org/10.1287/mnsc.6.3.259
IEEE Trans. on Automatic Control v.46 no.2 Simulationbased optimization of Markov reward processes P. Marbach;J. N. Tsitsiklis https://doi.org/10.1109/9.905687
Computers & Chemical Engineering v.24 Batch process modeling for optimization using reinforcement learning E. C. Martinez https://doi.org/10.1016/S0098-1354(00)00354-9
Applications of Artificial Neural Networks Temporal difference learning: A chemical process control application S. Miller;R. J. Williams;A. F. Murray(ed.)
Machine Learning v.21 no.3 The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces A. Moore;C. Atkeson
PhD thesis, Cambridge University Efficient Memory Based Robot Learning A. W. Moore
Machine Learning v.13 Prioritized sweeping: Reinforcement learning with less data and less time A. W. Moore;C. G. Atkeson
Computers & Chemical Engineering v.23 Model predictive control: Past, present and future M. Morari;J. H. Lee https://doi.org/10.1016/S0098-1354(98)00301-9
Proc. of the International Joint Conference on Artificial Intelligence A convergent reinforcement learning algorithm in the comtinuous case based on a finite difference method R. Munos
Machine Learning Journal v.40 A study of reinforcemetn learning in the continuous case by means of viscosity solutions R. Munos https://doi.org/10.1023/A:1007686309208
Advances in Neural Information Processing Systems v.10 Enhancing Q-learning for optimal asset allocation R. Neuneier;M. Jordan(ed.);M. Kearns(ed.);S. Solla(ed.)
IEEE Trans. on automatic Control v.47 no.10 Kernel-based reinforcement learning in average-cost problems D. Ormoneit;P. W. Glynn https://doi.org/10.1109/TAC.2002.803530
Machine Learning v.49 Kernel-based reinforcement learning D. Ormoneit;S. Sen https://doi.org/10.1023/A:1017928328829
Ann. Math. Statist. v.33 On estimation of a probability density function and mode E. Parzen https://doi.org/10.1214/aoms/1177704472
PhD thesis, North-eastern University Efficient Dynamic Programming-Based Learning for Control J. Peng
Adaptive Behavior v.1 no.4 Efficient learning and planning within the Dyna framework J. Peng;R. J. Williams https://doi.org/10.1177/105971239300100403
IEEE Trans. on Neural Networks v.8 no.5 Adaptive critic designs D. V. Prokhorov;D. C. Wunsch II https://doi.org/10.1109/72.623201
Markov Decision Processes M. L. Puterman
Control Engineering Practice v.11 no.7 A survey of industrial model predictive control technology S. J. Qin;T. A. Badgwell https://doi.org/10.1016/S0967-0661(02)00186-7
Mathematical and Computational Techniques for Multilevel Adaptive Methods U. Rude
Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University On-line Q-learning using connectionist systems G. A. Rummery;M. Niranjan
Proc. of the Fourth Connectionist Models Summer School Approximating Q-values with basis function representations P. Sabes
IBM J. Res. Develop. Some studies in machine learning using the game of checkers A. L. Samuel
IBM J. Res. Develop. Some studies in machine learning using the game of checkers II - recent progress A. L. Samuel
Adaptive Behavior v.6 no.2 Experiments with reinforcement learning in problems with continuous state and action spaces J. C. Santamaria;R. S. Sutton;A. Ram https://doi.org/10.1177/105971239700600201
Advances in Neural Information Processing Systems v.9 Learning from demonstration S. Schaal;M. C. Mozer(ed.);M. Jordan(ed.);T. Petsche(ed.)
IEEE Control Systems v.14 no.1 Robot juggling: an implementation of memory-based learning S. Schaal;C. Atkeson https://doi.org/10.1109/37.257895
Advances in Neural Information Processing Systems v.6 Temporal difference learning of position evaluation in the game of Go N. N. Schraudolph;P. Dayan;T. J. Sejnowski;J. D. Cowan(ed.);G. Tesauro(ed.);J. Alspector(ed.)
Advances in Neural Information Processing Systems v.9 Reinforcement learning for dynamic chanel allocation in cellular telephone systems S. Singh;D. Bertsekas;M. C. Mozer(ed.);M. I. Jordan(ed.);T. Petsche(ed.)
Machine Learning v.22 Reinforcement learning with replacing eligibility traces S. P. Singh;R. S. Sutton
Proc. 17th International Conf. on Machine Learning Practical reinforcement learning in continuous spaces W. D. Smart;L. P. Kaelbling
Advances in Neural Information Processing Systems v.12 Policy gradient methods for reinforce-ment learning with function approximation R. Sutton;D. McAllester;S. Singh;Y. Mansour;S. A. Solla(ed.);T. K. Leen(ed.);K.-R. Muller(ed.)
PhD thesis, University of Massachusetts Temporal Credit Assignment in Reinforcement Learning R. S. Sutton
Machine Learning v.3 no.1 Learning to predict by the method of temporal differences R. S. Sutton
Proc. of the Seventh International Conference on Machine Learning Integrated architectures for learning, planning, and reacting based on approximating dynamic programming R. S. Sutton
Proc. of the Eighth International Workshop on Machine Learning Planning by incremental dynamic programming R. S. Sutton
Advances in Neural Information Processing Systems v.8 Generalization in reinforcement learning: Successful examples using sparse coarse coding R. S. Sutton;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)
Psycol. Rev. v.88 no.2 Toward a modern theory of adaptive networks: Expectation and prediction R. S. Sutton;A. G. Barto https://doi.org/10.1037/0033-295X.88.2.135
Reinforcement Learning: An Introduction R. S. Sutton;A. G. Barto
Advanced Robotics v.14 no.5 Enhanced continuous valued Q-learning for real autonomous robots M. Takeda;T. Nakamura;M. Imai;T. Ogasawara;M. Asada https://doi.org/10.1163/156855300741852
Machine Learning v.8 Practical issues in temporal difference learning G. Tesauro
Neural Computation v.6 no.2 TD-Gammon, a self-teaching backgammon program, achieves master-level play G. Tesauro https://doi.org/10.1162/neco.1994.6.2.215
Communications of the ACM v.38 no.3 Temporal difference learning and TD-Gammon G. Tesauro https://doi.org/10.1145/203330.203343
Advances in Neural Information Processing Systems v.7 Learning to play the game of chess S. Thrun;G. Tesauro(ed.);D. S. Touretzky(ed.);T. K. Leen(ed.)
Proc. of the Fourth Connectionist Models Summer School Issues in using function approximation for reinforcement learning S. Thrun;A. Schwartz
Machine Learning v.16 Asynchronous stochastic approximation and Q-learning J. N. Tsitsiklis
IEEE Trans. on Automatic Control v.42 no.5 An analysis of temporal-difference learning with function approximation J. N. Tsitsiklis;B. Van Roy https://doi.org/10.1109/9.580874
Handbook of Markov Decision Processes: Methods and Applications Neuro-dynamic programming: Overview and recent trends B. Van Roy;E. Feinberg(ed.);A. Shwartz(ed.)
PhD thesis, University of Cambridge Learning from Delayed Rewards C. J. C. H. Watkins
Machine Learning v.8 Q-learning C. J. C. H. Watkins;P. Dayan
General Systems Yearbook v.22 Advanced forecasting methods for global crisis warning and models of intelligence P. J. Werbos
Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold Approximate dynamic programming for real-time control and neural modeling P. J. Werbos;A. White(ed.);D. A. Sofge(ed.)
Proc. of the Eighth International Workshop on Machine Learning Complexity and cooperation in Q-learning S. D. Whitehead
Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems R. J. Williams;L. C. Baird III
Computers & Chemical Engineering v.21S Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation J. A. Wilson;E. C. Martinez
Stochastic Problems in Control Stochastic control problems M. Wonham;B. Friedland(ed.)
PhD thesis, Oregon State University;Technical Report CS-96-30-1 Reinforcement Learning for Job-Shop Scheduling W. Zhang
Proc. of the Twelfth International Conference on Machine Learning A reinforcement learning approach to job-shop scheduling W. Zhang;T. G. Dietterich
Advances in Neural Information Processing Systems v.8 Highperformance jop-shop scheduling with a time delay $TD({\lambda})$ network W. Zhang;T. G. Dietterich;D. S. Touretzky(ed.);M. C. Mozer(ed.);M. E. Hasselmo(ed.)