Video Classification System Based on Similarity Representation Among Sequential Data

Lee, Hosuk;Yang, Jihoon;

doi:10.3745/KTCCS.2018.7.1.1

정보처리학회논문지:컴퓨터 및 통신 시스템 (KIPS Transactions on Computer and Communication Systems)

제7권1호
/
Pages.1-8
/
2018
/
2287-5891(pISSN)
/
2734-049X(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

순차 데이터간의 유사도 표현에 의한 동영상 분류

Video Classification System Based on Similarity Representation Among Sequential Data

이호석 (서강대학교 컴퓨터공학과) ;
양지훈 (서강대학교 컴퓨터공학과)

Lee, Hosuk ;
Yang, Jihoon

투고 : 2017.05.25
심사 : 2017.11.18
발행 : 2018.01.31

https://doi.org/10.3745/KTCCS.2018.7.1.1 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

동영상 데이터는 시간에 따른 정보는 물론이고, 많은 정보량과 함께 잡음도 포함하고 있기 때문에 이에 대한 간단한 표현을 학습하는 것은 쉽지 않다. 본 연구에서는 이와 같은 동영상 데이터를 추상적이면서 보다 간단하게 표현할 수 있는 순차 데이터간의 유사도 표현 방법과 딥러닝 학습방법을 제안한다. 이는 동영상을 구성하는 이미지 데이터 벡터들 사이의 유사도를 내적으로 표현할 때 그것들이 서로 최대한의 정보를 가질 수 있도록 하는 함수를 구하고 학습하는 것이다. 실제 데이터를 통하여 제안된 방법이 기존의 동영상 분류 방법들보다도 뛰어난 분류 성능을 보임을 확인하였다.

It is not easy to learn simple expressions of moving picture data since it contains noise and a lot of information in addition to time-based information. In this study, we propose a similarity representation method and a deep learning method between sequential data which can express such video data abstractly and simpler. This is to learn and obtain a function that allow them to have maximum information when interpreting the degree of similarity between image data vectors constituting a moving picture. Through the actual data, it is confirmed that the proposed method shows better classification performance than the existing moving image classification methods.

키워드

참고문헌

Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE transactions on Pattern Analysis and Machine Intelligence, Vol.35, No.8, pp.1798-1828, 2013. https://doi.org/10.1109/TPAMI.2013.50
A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep convolutional neural networks," In Advances in Neural Information Processing Systems (pp. 1097-1105), 2012.
Y. L. Boureau, and Y. L, Cun, "Sparse feature learning for deep belief networks," Proc. of Advances in Neural Information Processing Systems, pp. 1185-1192, 2008.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won, "The YLI-MED corpus: Characteristics, procedures, and plans," arXiv preprint arXiv:1503.04250, 2015.
C. Goller, and A. Kuchler, "Learning task-dependent distributed representations by backpropagation through structure," Proc. of IEEE International Conference on Neural Networks, pp.347-352, 1996.
S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol.6, No.2, pp.107-116, 1998. https://doi.org/10.1142/S0218488598000094
K. Ashraf, B. Elizalde, F. Iandola, M. Moskewicz, J. Bernd, G. Friedland, and K. Keutzer, "Audio-based multimedia event detection with DNNs and sparse sampling," Proc. of the 5th ACM on International Conference on Multimedia Retrieval, pp.611-614, 2015.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, Vol.323, No.9, pp.533-536, 1986. https://doi.org/10.1038/323533a0
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Reconition Challenge," International Journal of Computer Vision, Vol.115, No.3, pp.211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," arXiv preprint arXiv:1512.00567, 2015.
F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," Proc. of the 18th ACM International Conference on Multimedia, pp.1459-1462, 2010.
Abadi, M., Agarwal, A., et al., "TensorFlow: Large-scale machine learning on heterogeneous systems," Available: http://tensorflow.org (retrieved 2016, Feb. 2)

정보처리학회논문지:컴퓨터 및 통신 시스템 (KIPS Transactions on Computer and Communication Systems)

순차 데이터간의 유사도 표현에 의한 동영상 분류

Video Classification System Based on Similarity Representation Among Sequential Data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)