DOI QR코드

DOI QR Code

Trends in quantum reinforcement learning: State-of-thearts and the road ahead

  • Soohyun Park (Division of Computer Science, Sookmyung Women's University) ;
  • Joongheon Kim (Department of Electrical and Computer Engineering, Korea University)
  • 투고 : 2024.03.27
  • 심사 : 2024.08.13
  • 발행 : 2024.10.10

초록

This paper presents the basic quantum reinforcement learning theory and its applications to various engineering problems. With the advances in quantum computing and deep learning technologies, various research works have focused on quantum deep learning and quantum machine learning. In this paper, quantum neural network (QNN)-based reinforcement learning (RL) models are discussed and introduced. Moreover, the pros of the QNN-based RL algorithms and models, such as fast training, high scalability, and efficient learning parameter utilization, are presented along with various research results. In addition, one of the well-known multi-agent extensions of QNN-based RL models, the quantum centralized-critic and multiple-actor network, is also discussed and its applications to multi-agent cooperation and coordination are introduced. Finally, the applications and future research directions are introduced and discussed in terms of federated learning, split learning, autonomous control, and quantum deep learning software testing.

키워드

과제정보

MSIT (Ministry of Science and ICT (Information and Communications Technology)), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2024-RS-2024-00436887) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

참고문헌

  1. J. Yu, S. Qiu, and T. Yang, Optimization of hierarchical routing and resource allocation for power communication networks with QKD, J. Lightw. Technol. 42 (2024), no. 2, 504-512.
  2. V. Fernandez, J. Gmez-Garca, A. Ocampos-Guilln, and A. Carrasco-Casado, Correction of wavefront tilt caused by atmospheric turbulence using quadrant detectors for enabling fast free-space quantum communications in daylight, IEEE Access 6 (2018), 3336-3345.
  3. X. Jiang, M. Itzler, K. O'Donnell, M. Entwistle, M. Owens, K. Slomkowski, and S. Rangwala, InP-based single-photon detectors and Geiger-mode APD arrays for quantum communications applications, IEEE J. Sel. Topics Quantum Electron. 21 (2015), no. 3, 5-16.
  4. J. Choi and J. Kim, A tutorial on quantum approximate optimization algorithm (QAOA): Fundamentals and applications, (Proceedings of the IEEE International Conference on Information and Communication Technology Convergence, Jeju, Republic of Korea), 2019, pp. 138-142.
  5. J. Choi, S. Oh, and J. Kim, The useful quantum computing techniques for artificial intelligence engineers, (Proceedings of the IEEE International Conference on Information Networking, Barcelona, Spain), 2020, pp. 1-3.
  6. J. Kim, Y. Kwak, S. Jung, and J.-H. Kim, Quantum scheduling for millimeter-wave observation satellite constellation, (Proceedings of the IEEE VTS Asia Pacific Wireless Communications Symposium, Osaka, Japan), 2021, pp. 1-5.
  7. J. P. Kim, W. J. Yun, H. Baek, and J. Kim, Modern trends in quantum AI: Distributed and high-definition computation, (Proceedings of the IEEE Icoin, Bangkok, Thailand), 2023, pp. 750-754.
  8. B. Narottama, Z. Mohamed, and S. Assa, Quantum machine learning for Next-G wireless communications: Fundamentals and the path ahead, IEEE Open J. Commun. Soc. 4 (2023), 2204-2224.
  9. R. D. M. Simes, P. Huber, N. Meier, N. Smailov, R. M. Fchslin, and K. Stockinger, Experimental evaluation of quantum machine learning algorithms, IEEE Access 11 (2023), 6197-6208.
  10. Y. Kwak, W. J. Yun, S. Jung, and J. Kim, Quantum neural networks: concepts, applications, and challenges, (Proceedings of the IEEE International Conference on Ubiquitous and Future Networks, Jeju, Republic of Korea), 2021, pp. 413-416.
  11. J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S.-L. Kim, and M. Debbah, Communication-efficient and distributed learning over wireless networks: Principles and applications, Proc. IEEE 109 (2021), no. 5, 796-819.
  12. R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. A. I. Pieter Abbeel, and I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, (Proc. Advances in Neural Information Processing Systems, Long Beach, CA, USA), 2017, pp. 6382-6393.
  13. C. Park, W. J. Yun, J. P. Kim, T. K. Rodrigues, S. Park, S. Jung, and J. Kim, Quantum multi-agent actor-critic networks for cooperative mobile access in multi-UAV systems, IEEE Internet Things J. 10 (2023), no. 22, 20033-20048.
  14. W. J. Yun, J. P. Kim, S. Jung, J.-H. Kim, and J. Kim, Quantum multiagent actor critic neural networks for Internet-connected multirobot coordination in smart factory management, IEEE Internet Things J. 10 (2023), no. 11, 9942-9952.
  15. M. Choi, A. No, M. Ji, and J. Kim, Markov decision policies for dynamic video delivery in wireless caching networks, IEEE Trans. Wirel. Commun. 18 (2019), no. 12, 5705-5718.
  16. S. Park, C. Park, S. Jung, J. H. Kim, and J. Kim, Workload-aware scheduling using Markov decision process for infrastructure-assisted learning-based multi-UAV surveillance networks, IEEE Access 11 (2023), 16533-16548.
  17. C. Boutilier, Planning, learning and coordination in multiagent decision processes, (Proceedings of the Conference on Theoretical Aspects of Rationality and Knowledge, De Zeeuwse Stromen, The Netherlands), 1996, pp. 195-210.
  18. M. Choi, J. Kim, and J. Moon, Wireless video caching and dynamic streaming under differentiated quality requirements, IEEE J. Sel. Areas Commun. 36 (2018), no. 6, 1245-1257.
  19. N.-N. Dao, D.-N. Vu, W. Na, J. Kim, and S. Cho, SGCO: Stabilized green crosshaul orchestration for dense IoT offloading services, IEEE J. Sel. Areas Commun. 36 (2018), no. 11, 2538-2548.
  20. S. Jung, J. Kim, M. Levorato, C. Cordeiro, and J.-H. Kim, Infrastructure-assisted on-driving experience sharing for millimeterwave connected vehicles, IEEE Trans. Veh. Technol. 2021 (2021), 1.
  21. G. S. Kim, H. Lee, S. Park, and J. Kim, Joint frame rate adaptation and object recognition model selection for stabilized unmanned aerial vehicle surveillance, ETRI J. 45 (2023), no. 5, 811-821.
  22. J. Kim, G. Caire, and A. F. Molisch, Quality-aware streaming and scheduling for device-to-device video delivery, IEEE/ACM Trans. Netw. 24 (2016), no. 4, 2319-2331.
  23. J. Koo, J. Yi, J. Kim, M. A. Hoque, and S. Choi, Seamless dynamic adaptive streaming in LTE/Wi-Fi integrated network under smartphone resource constraints, IEEE Trans. Mobile Comput. 18 (2019), no. 7, 1647-1660.
  24. J. Yi, S. Kim, J. Kim, and S. Choi, Supremo: Cloud-assisted lowlatency super-resolution in mobile devices, IEEE Trans. Mobile Comput. 2021 (2021), 1.
  25. J. Koo, J. Yi, J. Kim, M. A. Hoque, and S. Choi, REQUEST: Seamless dynamic adaptive streaming over HTTP for multi-homed smartphone under resource constraints, (Proceedings of the ACM International Conference on Multimedia, Mountain View, CA, USA), 2017, pp. 934-942.
  26. M. J. Neely, Stochastic network optimization with application to communication and queueing systems, Synthesis Lectures Commun. Netw. 3 (2010), no. 1, 1-211.
  27. M. J. Neely, Energy optimal control for time-varying wireless networks, IEEE Trans. Inf. Theory 52 (2006), no. 7, 2915-2934.
  28. M. J. Neely, A. S. Tehrani, and A. G. Dimakis, Efficient algorithms for renewable energy allocation to delay tolerant consumers, (Proceedings of the IEEE International Conference on Smart Grid Communications, Gaithersburg, MD, USA), 2010, pp. 549-554.
  29. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing Atari with deep reinforcement learning, 2013. arXiv preprint. https://doi.org/10.48550/arXiv.1312.5602
  30. CJCH Watkins and P. Dayan, Q-learning, Mach. Learn. 8 (1992), no. 3-4, 279-292.
  31. T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications, IEEE Trans. Cybern. 50 (2020), no. 9, 3826-3839.
  32. S. Jung, W. J. Yun, J. Kim, and J.-H. Kim, Infrastructure-assisted cooperative multi-UAV deep reinforcement energy trading learning for big-data processing, (Proceedings of the IEEE International Conference on Information Networking, Jeju, Republic of Korea), 2021, pp. 159-162.
  33. S. Jung, W. J. Yun, M. Shin, J. Kim, and J.-H. Kim, Orchestrated scheduling and multi-agent deep reinforcement learning for cloud-assisted multi-UAV charging systems, IEEE Trans. Veh. Technol. 70 (2021), no. 6, 5362-5377.
  34. M. Shin, D.-H. Choi, and J. Kim, Cooperative management for PV/ESS-enabled electric vehicle charging stations: A multiagent deep reinforcement learning approach, IEEE Trans. Ind. Inf. 16 (2020), no. 5, 3493-3503.
  35. Y. Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, Introduction to quantum reinforcement learning: Theory and PennyLane-based implementation, (Proceedings of the IEEE International Conference on Information and Communication Technology Convergence, Jeju, Republic of Korea), 2021, pp. 416-420.
  36. X. You and X. Wu, Exponentially many local minima in quantum neural networks, (Proceedings of the International Conference on Machine Learning, Virtual Event), 2021.
  37. S. Park, J. P. Kim, C. Park, S. Jung, and J. Kim, Quantum multi-agent reinforcement learning for autonomous mobility cooperation, IEEE Commun. Mag. 2023 (2023), 1-7. (Early Access).
  38. W. Yun, Y. Kwak, J. Kim, H. Cho, S. Jung, J. Park, and J. Kim, Quantum multi-agent reinforcement learning via variational quantum circuit design, (Proceedings of the IEEE International Conference on Distributed Computing Systems, Bologna, Italy), 2022, pp. 1332-1335.
  39. H. Baek, S. Park, and J. Kim, Logarithmic dimension reduction for quantum neural networks, (Proc. Acm Conf. Inf. Knowl. Management, Birmingham, United Kingdom), 2023.
  40. G. S. Kim, J. Chung, and S. Park, Realizing stabilized landing for computation-limited reusable rockets: A quantum reinforcement learning approach, IEEE Trans. Veh. Technol. 2024 (2024), 1-6.
  41. A. G. Barto, R. S. Sutton, and C. W. Anderson, Looking back on the actor-critic architecture, IEEE Trans. Syst., Man, Cybern. Syst. 51 (2021), no. 1, 40-50.
  42. S. K. Jeswal and S. Chakraverty, Recent developments and applications in quantum neural network: A review, Archives Computat. Methods Eng. 26 (2019), 793-807. 
  43. N. Meyer, D. D. Scherer, A. Plinge, C. Mutschler, and M. J. Hartmann, Quantum natural policy gradients: Towards sample-efficient reinforcement learning, (Proc. IEEE International Conference on Quantum Computing and Engineering, Bellevue, WA, USA), 2023, pp. 36-41.
  44. A. Sequeira, L. P. Santos, and L. S. Barbosa, On quantum natural policy gradients, IEEE Trans. Quantum Eng. 2024 (2024), 1-13. (Early Access).
  45. B. Narottama and S. Y. Shin, UAV coverage path planning with quantum-based recurrent deep deterministic policy gradient, IEEE Trans. Veh. Technol. 73 (2024), no. 5, 7424-7429.
  46. C. Huang, Y. He, F. Yu, and P. Zeng, Resource allocation for cognitive radio inspired non-orthogonal multiple access networks: A quantum soft actor-critic method, (Proc. IEEE Global Communications Conference, Kuala Lumpur, Malaysia), 2023, pp. 3161-3166.
  47. F. Chiti, R. Fantacci, R. Picchi, and L. Pierucci, Mobile control plane design for quantum satellite backbones, IEEE Netw. 36 (2022), no. 1, 91-97.
  48. R. Picchi, F. Chiti, R. Fantacci, and L. Pierucci, Towards quantum satellite internet working: A software-defined networking perspective, IEEE Access 8 (2020), 210370-210381.
  49. Y. Wang, Y. Zhao, W. Chen, K. Dong, X. Yu, and J. Zhang, Routing and key resource allocation in SDN-based quantum satellite networks, (Proc. IEEE International Wireless Communications and Mobile Computing, Limassol, Cyprus), 2020, pp. 2016-2021.
  50. J. Kim, Y. Kwak, S. Jung, and J.-H. Kim, Quantum scheduling for millimeter-wave observation satellite constellation, (Proc. IEEE VTS Asia Pacific Wireless Communications Symposium, Osaka, Japan), 2021, pp. 1-5.
  51. A. Makarov, C. Prez-Herradn, G. Franceschetto, M. M. Taddei, E. Osaba, P. del Barrio Cabello, E. Villar-Rodriguez, and I. Oregi, Quantum optimization methods for satellite mission planning, IEEE Access 12 (2024), 71808-71820.
  52. S. Rainjonneau, I. Tokarev, S. Iudin, S. Rayaprolu, K. Pinto, D. Lemtiuzhnikova, M. Koblan, E. Barashov, M. Kordzanganeh, M. Pflitsch, and A. Melnikov, Quantum algorithms applied to satellite mission planning for earth observation, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sensing 16 (2023), 7062-7075.
  53. T. Stollenwerk, V. Michaud, E. Lobe, M. Picard, A. Basermann, and T. Botter, Agile earth observation satellite scheduling with a quantum annealer, IEEE Trans. Aerospace Electr. Syst. 57 (2021), no. 5, 3520-3528.
  54. S. Park, S. Jung, and J. Kim, Dynamic quantum federated learning for satellite-ground integrated systems using slimmable quantum neural networks, IEEE Access 12 (2024), 58239-58247.
  55. L. Bacsardi, On the way to quantum-based satellite communication, IEEE Commun. Mag. 51 (2013), no. 8, 50-55.
  56. D. Huang, Y. Zhao, T. Yang, S. Rahman, X. Yu, X. He, and J. Zhang, Quantum key distribution over double-layer quantum satellite networks, IEEE Access 8 (2020), 16087-16098.
  57. S. Otgonbaatar and D. Kranzlmuller, Exploiting the quantum advantage for satellite image processing: Review and assessment, IEEE Trans. Quantum Eng. 5 (2024), 1-9.
  58. M. K. Abdel-Aziz, C. Perfecto, S. Samarakoon, M. Bennis, and W. Saad, Vehicular cooperative perception through action branching and federated reinforcement learning, IEEE Trans. Commun. 70 (2022), no. 2, 891-903.
  59. A. Khalatbarisoltani, L. Boulon, and X. Hu, Integrating model predictive control with federated reinforcement learning for decentralized energy management of fuel cell vehicles, IEEE Trans. Intell. Transp. Syst. 24 (2023), no. 12, 13639-13653.
  60. S. Lee and D.-H. Choi, Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources, IEEE Trans. Ind. Inform. 18 (2022), no. 1, 488-497.
  61. M. Moniruzzaman, A. Yassine, and R. Benlamri, Blockchain and federated reinforcement learning for vehicle-to-everything energy trading in smart grids, IEEE Trans. Artif. Intell. 5 (2024), no. 2, 839-853.
  62. X. Wang, J. Hu, H. Lin, S. Garg, G. Kaddoum, M. J. Piran, and M. S. Hossain, QoS and privacy-aware routing for 5G-enabled industrial Internet of things: a federated reinforcement learning approach, IEEE Trans. Ind. Inform. 18 (2022), no. 6, 4189-4197.
  63. S. A. Khowaja, I. H. Lee, K. Dev, M. A. Jarwar, and N. M. F. Qureshi, Get your foes fooled: proximal gradient split learning for defense against model inversion attacks on IoMT data, IEEE Trans. Netw. Sci. Eng. 10 (2023), no. 5, 2607-2616.
  64. J. Liu, X. Lyu, Q. Cui, and X. Tao, Similarity-based label inference attack against training and inference of split learning, IEEE Trans. Inform. Forensics Secur. 19 (2024), 2881-2895.
  65. N. D. Pham, A. Abuadbba, Y. Gao, K. T. Phan, and N. Chilamkurti, Binarizing split learning for data privacy enhancement and computation reduction, IEEE Trans. Inform. Forensics Secur. 18 (2023), 3088-3100.
  66. Z. Wang, G. Yang, H. Dai, and C. Rong, Privacy-preserving split learning for large-scaled vision pre-training, IEEE Trans. Inform. Forensics Secur. 18 (2023), 1539-1553.
  67. W. Wu, M. Li, K. Qu, C. Zhou, X. Shen, W. Zhuang, X. Li, and W. Shi, Split learning over wireless networks: Parallel design and resource management, IEEE J. Sel. Areas Commun. 41 (2023), no. 4, 1051-1066.
  68. Y. J. Ha, M. Yoo, G. Lee, S. Jung, S. W. Choi, J. Kim, and S. Yoo, Spatio-temporal split learning for privacy-preserving medical platforms: case studies with COVID-19 CT, X-ray, and Cholesterol data, IEEE Access 9 (2021), 121046-121059.
  69. S. Bharadwaj, S. Carr, N. Neogi, and U. Topcu, Decentralized control synthesis for air traffic management in urban air mobility, IEEE Trans. Contr. Netw. Syst. 8 (2021), no. 2, 598-608.
  70. A. P. Cohen, S. A. Shaheen, and E. M. Farrar, Urban air mobility: History, ecosystem, market potential, and challenges, IEEE Trans. Intell. Transp. Syst. 22 (2021), no. 9, 6074-6087.
  71. C. Reiche, A. P. Cohen, and C. Fernando, An initial assessment of the potential weather barriers of urban air mobility, IEEE Trans. Intell. Transp. Syst. 22 (2021), no. 9, 6018-6027.
  72. R. Hoffmann, H. Nishimura, and R. Latini, Urban air mobility situation awareness from enterprise architecture perspectives, IEEE Open J. Syst. Eng. 1 (2023), 12-25.
  73. S. H. Kim, Receding horizon scheduling of on-demand urban air mobility with heterogeneous fleet, IEEE Trans. Aerospace Electr. Syst. 56 (2020), no. 4, 2751-2761.
  74. C. Liberto, G. Valenti, S. Orchi, M. Lelli, M. Nigro, and M. Ferrara, The impact of electric mobility scenarios in large urban areas: the Rome case study, IEEE Trans. Intell. Transp. Syst. 19 (2018), no. 11, 3540-3549. 
  75. C. Cato and S. Lim, A miniaturized circularly polarized, parasitic array antenna for ground station communication with cube satellites, (Proceedings of the 2012 IEEE International Symposium on Antennas and Propagation, Chicago, IL, USA), 2012, pp. 1-2.
  76. V. M. Salles, S. E. Barbin, and L. C. Kretly, A design of adiabatic digital circuits for micro, nano and cube satellites: four stage JK-FF binary counter using four-phase AC-clocked power-supply, SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (Aguas De Lindoia, Brazil), 2017, pp. 1-4.
  77. H. An, C. Kim, and Y. B. Park, Substrate integrated waveguide antenna with metasurface for cube satellites, (International Conference on Information and Communication Technology Convergence, Jeju, Republic of Korea), 2019, pp. 224-226.
  78. T. I. Leong, Y. M. O. Abbas, M. A. C. Purio, and H. A. Elmegharbel, Image classification unit: A u-net convolutional neural network for on-orbit cloud detection aboard cubesats, (IEEE International Geoscience and Remote Sensing Symposium Igarss, Brussels, Belgium), 2021, pp. 2807-2810.
  79. A. Shrivastav, S. Singh, A. Mahajan, and S. Bhattacharya, Effective control & software techniques for high efficiency GaN FET based flexible electrical power system for cube-satellites, (IEEE Applied Power Electronics Conference and Exposition, Long Beach, CA, USA), 2016, pp. 601-608.
  80. D. Giebas and R. Wojszczyk, Detection of concurrency errors in multithreaded applications based on static source code analysis, IEEE Access 9 (2021), 61298-61323.
  81. S. Park, H. Feng, C. Park, Y. K. Lee, S. Jung, and J. Kim, EQuaTE: Efficient quantum train engine for run-time dynamic analysis and visual feedback in autonomous driving, IEEE Internet Comput. 25 (2023), no. 7, 24-31.
  82. X. Zhang, C. Feng, R. Li, J. Lei, and C. Tang, NeuralTaint: A key segment marking tool based on neural network, IEEE Access 7 (2019), 68786-68798.
  83. J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature Commun. 9 (2018), no. 1, 4812.
  84. S. Park, H. Feng, W. Yun, C. Park, Y. Lee, S. Jung, and J. Kim, Demo: eQuaTE: efficient quantum train engine design and demonstration for dynamic software analysis, (Proc. IEEE Inter'l Conf. on Distributed Computing Systems, Hong Kong, China), 2023, pp. 1009-1012.