References
- Zhong, Yu., Hongjiang Zhang., Anil K. Jain,.: Automatic Caption Localization in Compressed Video. In: IEEE transactions on pattern analysis and machine intelligence, Vol. 22, No. 4 (2000).
- Study of Video Captioning Problem, Jiaqi Su Princeton University jiaqis@princeton.edu, (2018).
- Jain, A.K., Yu, B.: Automatic text location in images and video frames. In: Pattern Recognition, vol. 31, pp. 2055-2076 (1998). https://doi.org/10.1016/S0031-3203(98)00067-3
- Chien Cheng Lee., Yu-Chun Chiang., Hau-Ming Huang., Chun-Li Tsai.: A Fast Caption Localization and Detection for News Videos. In: Second International Conference on Innovative Computing, Information and Control (2007).
- Xingqi Wang., Guang Dong,.: A Novel Approach for Captions Detection in Video Sequences. In: International Conference on Fuzzy Systems and Knowledge Discovery, (2009).
- Watanabe, K., Sugiyama, M.: Automatic caption generation for video data. Time alignment between caption and acoustic signal. In: IEEE Third Workshop on Multimedia Signal Processing (1999).
- Suzuki, T., Kitazume, T., Sugiyama, M,.: The latest achievement of VC project for automatic video caption generation. In: IEEE Workshop on Multimedia Signal Processing (2002).
- Liu, Y., Dey, S., Lu, Y,.: Enhancing Video Encoding for Cloud Gaming Using Rendering Information. In: IEEE Transactions on Circuits and Systems for Video Technology, 25(12) (2015).
- Akshada, A., Gade., Arati, J., Vyavahare,.: Feature Extraction using GLCM for Dietary Assessment Application, In: International Journal Multimedia and Image Processing (IJMIP), Vol 8, Issue 2 (2018).
- Kanimozhi, P., Sathiya, S., Balasubramanian, M., Sivaguru, P., Sivaraj, P,.: Evaluation of Machine Learning And Deep Learning Approaches To Classify Breast Cancer Using Thermography, In: International Journal of Psychology and Education, Vol 58, Number 2 (2021).
- Xu, K., Ba, J., Kiros, R., Cho, K., Courvile, A., Salakhutdinov, R., Zemel, R., Bengio, Y,.: Show attend and tell: Neural image caption generation with visual attention. In: arXiv: 1502, 03044, 2(3):5 (2015).
- Vinyals, O., Toshev, A., Samy, B., Erhan, D,.: Show and tell: A neural image caption generator. In: CVPR, (2015).
- Donahue, J., Anne, L., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T,.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, (2015).
- Anne, L., Hendricks, S. Venugopalan, Rohrbach, M., Mooney, R., Saenko, K., Darrell, T,.: Deep compositional captioning: Describing novel object categories without paired training data. In: CVPR, (2016).
- Fang, H., Gupta, S., Landola, F., Srivastava, K., Deng, L., Dollar, P., Jianferg, G.: From captions to visual concepts and back. In: CVPR, (2015).
- Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: arXiv:1412.6632 (2014).
- S. Venugopalan, Anne, L., Hendricks, Rohrbach, M., Mooney, R., Saenko, K., Darrell, T,.: Captioning images with diverse objects. In: arXiv:1606.07770 (2016).
- Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR, (2015).
- Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: CVPR, (2016).
- Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural computation, 9(8) (1997).
- Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating videos to natural language using deep recurrent neural networks. In: NAACL, (2015).
- Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing Videos by Exploiting Temporal Structure. In: IEEE International Conference on Computer Vision (ICCV), (2015).
- Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language. In: CVPR, (2016).
- Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: ICCV, (2015).
- Pan, P., Xu, Z., Yang, Y., Wu, F., Zhuang, Y.: Hierarchical recurrent neural encoder for video representation with application to captioning. In: CVPR, (2016).
- Yang, Y., Zhou, J., Ai, J., Bin, Y., Hanjalic, A., Shen, H.T., Ji, Y.: Video captioning by adversarial LSTM. In: IEEE Image Processing, 27, 5600-5611, (2018). https://doi.org/10.1109/TIP.2018.2855422
- Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Trans. Pattern Anal. Mach. Intell., 39, 677-691(2017). https://doi.org/10.1109/TPAMI.2016.2599174
- Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., Dai, Q.: STAT: Spatial-temporal attention mechanism for video captioning. In: IEEE Trans. Multimed. 22, 229-241 (2019). https://doi.org/10.1109/tmm.2019.2924576
- Pan, P., Xu, Z., Yang, Y., Wu, F., Zhuang, Y.: Hierarchical recurrent neural encoder for video representation with application to captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, (2016).
- Yu, H., Wang, J., Huang, Z., Yang, Y., Xu, W.: Video paragraph captioning using hierarchical recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, (2016).
- Anne Hendricks, L., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: Describing novel object categories without paired training data. In: IEEE Conference on Computer Vision and Pattern Recognition. (2016).
- Krizhevsky, A., Sutskever, I., Hinton, GE.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. (2012).
- Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556, (2014).
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucka, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, (2015).
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, (2016).
- Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, (2016).
- Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, (2015).
- Badrinarayanan, V., Handa, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labeling. In: arXiv preprint arXiv:1505.07293, (2015).
- Chen, L.-C., Papandreou, G., Kokkinos, L., Murphy, K., Yuille, L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In: arXiv preprint arXiv:1606.00915, (2016).
- Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, (2015).