DOI QR코드

DOI QR Code

Video Compression Standard Prediction using Attention-based Bidirectional LSTM

어텐션 알고리듬 기반 양방향성 LSTM을 이용한 동영상의 압축 표준 예측

  • Kim, Sangmin (Department of Electronics and Computer Engineering, Hanyang University) ;
  • Park, Bumjun (Department of Electronics and Computer Engineering, Hanyang University) ;
  • Jeong, Jechang (Department of Electronics and Computer Engineering, Hanyang University)
  • 김상민 (한양대학교 전자컴퓨터통신공학과) ;
  • 박범준 (한양대학교 전자컴퓨터통신공학과) ;
  • 정제창 (한양대학교 전자컴퓨터통신공학과)
  • Received : 2019.07.24
  • Accepted : 2019.08.12
  • Published : 2019.09.30

Abstract

In this paper, we propose an Attention-based BLSTM for predicting the video compression standard of a video. Recently, in NLP, many researches have been studied to predict the next word of sentences, classify and translate sentences by their semantics using the structure of RNN, and they were commercialized as chatbots, AI speakers and translator applications, etc. LSTM is designed to solve the gradient vanishing problem in RNN, and is used in NLP. The proposed algorithm makes video compression standard prediction possible by applying BLSTM and Attention algorithm which focuses on the most important word in a sentence to a bitstream of a video, not an sentence of a natural language.

본 논문에서는 어텐션 알고리듬 (attention algorithm) 기반의 양방향성 LSTM (bidirectional long short-term memory; BLSTM) 을 동영상의 압축 표준을 예측하기 위해 사용한다. 자연어 처리 (natural language processing; NLP) 분야에서 순환적 신경망 (recurrent neural networks; RNN) 의 구조를 이용하여 문장의 다음 단어를 예측하거나 의미에 따라 문장을 분류하거나 번역하는 연구들은 계속되어왔고, 이는 챗봇, 음성인식 스피커, 번역 애플리케이션 등으로 상용화되었다. LSTM 은 RNN에서 gradient vanishing problem 을 해결하고자 고안됐고, NLP 분야에서 유용하게 사용되고 있다. 제안한 알고리듬은 BLSTM과 특정 단어에 집중하여 분류할 수 있는 어텐션 알고리듬을 자연어 문장이 아닌 동영상의 비트스트림에 적용해 동영상의 압축 표준을 예측하는 것이 가능하다.

Keywords

References

  1. J. L. Elman, "Finding structure in time," Cognitive science, Vol.14, No.2, pp.179-211, March 1990. https://doi.org/10.1207/s15516709cog1402_1
  2. Y. LeCun, "A Theoretical Framework for Back-Propagation," Proceedings of the 1988 connectionist models summer school, Pittsburgh, Vol.1, pp.21-28, 1988.
  3. F. J. Pineda, "Generalization of back-propagation to recurrent neural networks," Physical review letters, Vol.59, No.19, pp.2229-2232, November 1987. https://doi.org/10.1103/PhysRevLett.59.2229
  4. S. Hochreiter, and J. Schmidhuber, "Long short-term memory," Neural computation, Vol.9, No.8, pp.1735-1780, November 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  5. P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, "Attention-based bidirectional long short-term memory networks for relation classification," Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 2: Short Papers), Berlin, Germany, pp. 207-212, 2016.
  6. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," Proceeding of International Conference on Learning Representations (ICLR), San Diego, pp. 1-15, 2015.
  7. Y. Wang, M. Huang, L. Zhao, and Xiaoyan Zhu, "Attention-based LSTM for aspect-level sentiment classification," Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, pp.606-615, 2016.
  8. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical Attention Networks for Document Classification," Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, pp. 1480-1489, 2016.
  9. S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol.6, No.02, pp.107-116, April 1998. https://doi.org/10.1142/S0218488598000094
  10. R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," International conference on machine learning (ICML), Atlanta, pp.1310-1318, 2013.
  11. M. Schuster, and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, Vol.45, No.11, pp.2673-2681, November 1997. https://doi.org/10.1109/78.650093
  12. S. Zhang, D. Zheng, X. Hu, and M. Yang, "Bidirectional long short-term memory networks for relation classification," Proceeding of the 29th Pacific Asia conference on language, information and computation (PACLIC), San Diego, pp.73-78, 2015.
  13. Y. Kim, "Convolutional Neural Networks for Sentence Classification," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp.1746-1751, 2014.
  14. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in neural information processing systems (NIPS), Long Beach, pp.5998-6008, 2017.
  15. S. Wee, and J. Jeong, "RNN-based bitstream feature extraction method for codec classification," International Workshop on Advanced Image Technology (IWAIT) 2019, Singapore, Singapore, Vol.11049, p. 110493N, 2019.
  16. Download H. 264 High Profile Video streams, http://ftp.arl.mil/-mike/ping/html (accessed Jun. 25, 2019).
  17. Test Sequences encoded in the H.264/MPEG-4 standard, https://pi4.informatik.uni-mannheim.de/-kiess/test_sequences/download/(accessed Jun. 25, 2019).
  18. T. R. Gardos, "H.263+: THE NEW ITU-T RECOMMENDATION FOR VIDEO CODING AT LOW BIT RATES," Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181), Seattle, Vol.6, 1998.
  19. S. Ookubo, H.264/AVC TEXTBOOK, (Translated by Jechang Jeong), HONGRUNG PUBLISHING COMPANY, pp.330-333, 2007.
  20. A. Luque, A. Carrasco, A. Martin, and A. Heras, "The impact of class imbalance in classification performance metrics based on the binary confusion matrix", Pattern Recognition, Vol.91, pp.216-231, 2019. https://doi.org/10.1016/j.patcog.2019.02.023