참고문헌
- M. R. Dogar and S. S. Srinivasa, "A planning framework for non-prehensile manipulation under clutter and uncertainty," Autonomous Robots, vol. 33, no. 3, pp. 217-236, 2012, DOI: 10.1007/s10514-012-9306-z.
- M. Fox and D. Long, "PDDL2. 1: An extension to PDDL for expressing temporal planning domains," Journal of Artificial Intelligence Research, vol. 20, pp. 61-124, 2003, DOI: 10.1613/jair.1129.
- S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, "Combined task and motion planning through an extensible planner-independent interface layer," 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, pp. 639-646, 2014, DOI: 10.1109/ICRA. 2014.6906922.
- R. Munos, "From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning," Foundations and Trends R in Machine Learning, vol. 7, no. 1, pp. 1-129, 2014, [Online], https://hal.archives-ouvertes.fr/hal00747575/. https://doi.org/10.1561/2200000038
- Y. Labbe, S. Zagoruyko, I. Kalevatykh, I. Laptev, J. Carpentier, M. Aubry, and J. Sivic, "Monte-carlo tree search for efficient visually guided rearrangement planning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3715-3722, April, 2020, DOI: 10.1109/LRA.2020.2980984.
- P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba, "Transfer from simulation to real world through learning deep inverse dynamics model," arXiv preprint arXiv:1610.03518, 2016, [Online], https://arxiv.org/abs/1610.03518.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2017, DOI: 10.1109/TPAMI.2017.2699184.
- T. Schaul, D. Horgan, K. Gregor, and D. Silver, "Universal value function approximators," The 32nd International Conference on Machine Learning, pp. 1312-1320, 2015, [Online], http://proceedings.mlr.press/v37/schaul15.html.
- M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, "Rainbow: Combining improvements in deep reinforcement learning," arXiv preprint arXiv:1710.02298, 2017, [Online], https://arxiv.org/abs/1710.02298.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015, DOI: 10.1038/nature14236.
- R. Dearden, N. Friedman, and S. Russell, "Bayesian Q-learning," Innovative Applications of Artificial Intelligence Conference, pp. 761-768, 1998, [Online], https://www.aaai.org/Papers/AAAI/1998/AAAI98-108.pdf.
- H. Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 2613-2621, 2010, [Online], https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015, [Online], https://arxiv.org/abs/1511.05952.
- Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, "Dueling network architectures for deep reinforcement learning," The 33rd International Conference on Machine Learning, pp. 1995-2003, 2016, [Online], http://proceedings.mlr.press/v48/wangf16.html.
- M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, "Noisy networks for exploration," arXiv preprint arXiv:1706.10295, 2017, [Online], https://arxiv.org/abs/1706.10295.
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988, DOI: 10.1007/BF00115009.
- M. G. Bellemare, W. Dabney, and R. Munos, "A distributional perspective on reinforcement learning," arXiv preprint arXiv: 1707.06887, 2017, [Online], https://arxiv.org/abs/1707.06887.
- X. Zhu, and A. B. Goldberg, "Introduction to semi-supervised learning," Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 3, no. 1, pp. 1-130, 2009, DOI: 10.2200/S00196ED1V01Y200906AIM006.
- D. M. Allen, "Mean square error of prediction as a criterion for selecting variables," Technometrics, vol. 13, no. 3, pp. 469-475, 1971, [Online], https://amstat.tandfonline.com/doi/citedby/10.1080/00401706.1971.10488811?scroll=top&needAccess=true#.X9po-NgzaUk.
- I. V. Tetko, D. J. Livingstone, and A. I. Luik, "Neural network studies. 1. Comparison of overfitting and overtraining," Journal of Chemical Information and Computer Sciences, vol. 35, no. 5, pp. 826-833, 1995, DOI: 10.1021/ci00027a006.