Acknowledgement
Grant : 퍼스널 미디어가 연결공유결합하여 재구성 가능케 하는 복합 모달리티 기반 미디어 응용 프레임워크 개발
Supported by : 정보통신기술진흥센터
References
- J.-H. Shin, S.-K. Baek and P.-K. Kim, "Video Event Detection according to Generating of Semantic Unit based on Moving Object," Journal of Korea Multimedia Society, Vol. 11, No. 2, pp. 143-152, 2008.
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," Proc. of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Vol. 37, pp. 448-456, 2014.
- S. Yu et al., "CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED)," Proc. of the 2014 TRECVID Video Retrieval Evaluation Workshop, 2014.
- J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga and G. Toderici, "Beyond Short Snippets: Deep Networks for Video Classification," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702, 2015.
- Z. Wu, Y.-G. Jiang, J. Wang, J. Pu and X. Xue, "Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification," Proc. of the 22nd ACM International Conference on Multimedia, pp. 167-176, 2014.
- C. Szegedy et al., "Going Deeper with Convolutions," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
- F. Rosenblatt, Principles of Neurodynamics, SpartanBook, 1962.
- Soomro, A. R. Zamir and M. Shah, UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild, CRCV-TR-12-01, 2012.
- A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classificattion with Deep Convolutional Neural Networks," Proc. of Neural Information Processing Systems 2012, 2012.
- Y. LeCunn, L. Bottou, Y. Bengio and P. Haffiner, "Gradient-Based Learning Applied to Document Recognition," Proc. of IEEE, Vol. 86, No. 11, pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
- S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Visual Recognition," Proc. of International Conference on Learning Representations 2014, 2014.
- B. Zhu, W. Li and X. Xue, "A Novel Audio Fingerprinting Method Robust to Time Scale Modification and Pitch Shifting," Proc. of the 18th ACM International Conference on Multimedia, pp. 987-990, 2010.
- M. Sahidullah and Goutam Saha, "Design, Analysis and Experimental Evaluation of Block based Transformation in MFCC Computation for Speaker Recognition," Journal of Speech Communication, Vol. 54, No. 4, pp. 543-565, 2012. https://doi.org/10.1016/j.specom.2011.11.004
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, "Caffe: Convolutional Architecture for Fast Feature Embedding," Proc. of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014.
- LISA Lab, Theano, http://deeplearning.net/software/theano/, 2015.
- D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks," Proc. of International Conference on Computer Vision 2015, pp. 435-442, 2015.
- N. Srivastava, E. Mansimov and R. Salakhutdinov, "Unsupervised Learning of Video Representations using LSTMs," Proc. of the 32nd International Conference on Machine Learning, pp. 843-852, 2015.
- H. Ye, Z. Wu, R.-W. Zhao, X. Wang, Y.-G. Jiang and X. Xue, "Evaluating Two-Stream CNN for Video Classification," Proc. of the 5th ACM International Conference on Multimedia Retrieval, pp. 435-442, 2015.