References
- J. Donahue et al., Long-term recurrent convolutional networks for visual recognition and description, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 2625-2634.
- O. Vinyals et al., Show and tell: A neural image caption generator, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 3156-3164.
- Y. Dong et al., Improving interpretability of deep neural networks with semantic information, arXiv preprint arXiv: 1703.04096 (2017), 3-19.
- L.A. Hendricks et al., Generating visual explanations, in Eur. Conf. Comput. Vision, Amsterdam, The Netherlands, Oct. 2016, pp. 3-19.
- L.A. Hendricks et al., Deep compositional captioning: Describing novel object categories without paired training data, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 1-10.
- Q. You et al., Image captioning with semantic attention, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 4651-4659.
- S.J. Rennie et al., Self-critical sequence training for image captioning, in IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 1179-1195.
- W. Qi et al., Image captioning and visual question answering based on attributes and external knowledge, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), no. 6, 1367-1381. https://doi.org/10.1109/TPAMI.2017.2708709
- Y. Youngjae et al., End-to-end concept word detection for video captioning, retrieval, and question answering in IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3261-3269.
- P. Anderson et al., Bottom-up and top-down attention for image captioning and VQA, arXiv preprint arXiv: 1707.07998, 2017.
- L. Jiasen et al., Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3242-3250.
- T. Yao et al., Boosting image captioning with attributes, in IEEE Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 22-29.
- C. Wang, H. Yang, and C. Meinel, Image captioning with deep bidirectional lstms and multi-task learning, ACM Trans. Multimedia Comput., Commun., Applicat., 14 (2018), no. 2s, 1-20.
- C. Szegedy et al., Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 1-9.
- S. Reed et al., Learning deep representations of fine-grained visual descriptions, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 49-58.
- L. Zhang et al., Learning a deep embedding model for zero-shot learning, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3010-3019.
- X. He and Y. Peng, Fine-grained image classification via combining vision and language, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 7332-7340.
- R. Kiros, R. Salakhutdinov, and R.S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, arXiv preprint arXiv: abs/1411.2539, 2014.
- J. Mao et al., Learning like a child: Fast novel visual concept learning from sentence descriptions of images, in Proc. IEEE Int. Conf. Comput. Vision, Santiago, Chile, 2015, pp. 2533-2541.
- R. Vedantam et al., Context-aware captions from context-agnostic supervision, in Proc. IEEE, Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 1070-1079.
- A.H. Abdulnabi et al., Multi-task CNN model for attribute prediction, IEEE Trans. Multimedia 17 (2015), no. 11, 1949-1959. https://doi.org/10.1109/TMM.2015.2477680
- T.-H. Chen et al., Show adapt and tell: Adversarial training of cross-domain image captioner, in IEEE, Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 521-530.
- R.R. Selvaraju et al., Grad-CAM: Visual explanations from deep networks via gradient-based localization, in IEEE Int. Conf. Comput. Vision, Venice, Italy, Oct. 2017, pp. 618-626.
- Y.-C. Yoon et al., Fine-grained mobile application clustering model using retrofitted document embedding, ETRI J. 39 (2017), no. 4, 443-454. https://doi.org/10.4218/etrij.17.0116.0936
- S. Kong and C. Fowlkes, Low-rank bilinear pooling for fine-grained classification, in IEEE Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 7025-7034.
- Y. Shaoyong et al., A model for fine-grained vehicle classification based on deep learning, Neurocomput. 257 (2017), 97-103. https://doi.org/10.1016/j.neucom.2016.09.116
- X.-S. Wei et al., Selective convolutional descriptor aggregation for fine-grained image retrieval, IEEE Trans. Image Process. 26 (2017), no. 6, 2868-2881. https://doi.org/10.1109/TIP.2017.2688133
- G.-S. Xie et al., LG-CNN: from local parts to global discrimination for fine-grained recognition, Pattern Recogn. 71 (2017), 118-131. https://doi.org/10.1016/j.patcog.2017.06.002
- S.H. Lee, HGO-CNN: Hybrid generic-organ convolutional neural network for multi-organ plant classification, in IEEE Int. Conf. Image Process., Beijing, China, Sept. 2017, pp. 4462-4466.
- A. Li et al., Zero-shot fine-grained classification by deep feature learning with semantics, arXiv preprint arXiv: abs/1707.00785, 2017.
- Z. Akata et al., Evaluation of output embeddings for fine-grained image classification, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 2927-2936.
- R. Ranjan, V. M. Patel, and R. Chellappa, Hyperface: A deep multitask learning framework for face detection, landmark localization, pose estimation, and gender recognition, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2018), 121-135. https://doi.org/10.1109/TPAMI.2017.2781233
- K. Hashimoto et al., A joint many-task model: Growing a neural network for multiple NLP tasks, arXiv preprint arXiv: abs/1611.01587, 2016.
- R. Caruana, Multitask learning: a knowledge-based source of inductive bias, in Proc. Int. Conf. Mach. Learn., Amherst, MA, USA, June 1993, pp. 41 - 48.
- L. Duong et al., Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser, in Proc. Annu. Meeting Association Computat. Linguistics Int. Joint Conf. Natural Language Process., Beijing, China, July 2015, pp. 845-850.
- M. Nilsback and A. Zisserman, Automated flower classification over a large number of classes, in Proc. Indian Conf. Comput. Vision, Graphics Image Process., Bhubaneswar, India, Dec. 2008, pp. 722-729.
- C. Wah et al., The Caltech-UCSD Birds-200-2011 Dataset, Tech. Report CNS-TR-2011-001, California Institute of Technology, 2011.
- K. Papineni et al., Bleu: A method for automatic evaluation of machine translation, in Proc. Annu. Meeting Association Computat. Linguistics, Philadelphia, PA, USA, July 2002, pp. 311-318.
- C.-Y. Lin, Rouge: a package for automatic evaluation of summaries, in Workshop Text Summarization Branches Out, Post-Conf. Workshop ACL, Barcelona, Spain, July 2004, pp. 74-81.
- S. Banerjee and A. Lavie, Meteor: an automatic metric for MT evaluation with improved correlation with human judgments, in Proc. ACL Workshop Intrinsic Extrinsic Evaluation Measures Mach. Translation Summarization, Ann Arbor, MI, USA, 2005, pp. 65-72.
- R. Lawrence, C.L. Zitnick, and D. Parikh, Cider: Consensus-based image description evaluation, arXiv preprint arXiv: abs/1411.5726 (2014).
- C. Szegedy, S. Ioffe, and V. Vanhoucke, Inception-v4, Inception-Resnet and the impact of residual connections on learning, in Proc. AAAI Conf. Artif. Intell., San Francisco, CA, USA, Feb. 2017, pp. 2478-4284.
- A. Paszke et al., Automatic differentiation in PyTorch, in Proc. NIPS, Long Beach, CA, USA, 2017.
Cited by
- Automated optimization for memory-efficient high-performance deep neural network accelerators vol.42, pp.4, 2020, https://doi.org/10.4218/etrij.2020-0125
- CitiusSynapse: A Deep Learning Framework for Embedded Systems vol.11, pp.23, 2020, https://doi.org/10.3390/app112311570