DOI QR코드

DOI QR Code

모바일/임베디드 객체 및 장면 인식 기술 동향

Recent Trends of Object and Scene Recognition Technologies for Mobile/Embedded Devices

  • 발행 : 2019.12.01

초록

Although deep learning-based visual image recognition technology has evolved rapidly, most of the commonly used methods focus solely on recognition accuracy. However, the demand for low latency and low power consuming image recognition with an acceptable accuracy is rising for practical applications in edge devices. For example, most Internet of Things (IoT) devices have a low computing power requiring more pragmatic use of these technologies; in addition, drones or smartphones have limited battery capacity again requiring practical applications that take this into consideration. Furthermore, some people do not prefer that central servers process their private images, as is required by high performance serverbased recognition technologies. To address these demands, the object and scene recognition technologies for mobile/embedded devices that enable optimized neural networks to operate in mobile and embedded environments are gaining attention. In this report, we briefly summarize the recent trends and issues of object and scene recognition technologies for mobile and embedded devices.

키워드

과제정보

연구 과제번호 : 객체 추출 및 실-가상 정합 지원 모바일 AR 기술 개발

연구 과제 주관 기관 : 정보통신기획평가원

참고문헌

  1. Uijlings, Jasper RR et al., "Selective search for object recognition," International journal of computer vision 104.2 (2013): 154-171. https://doi.org/10.1007/s11263-013-0620-5
  2. Ren, Shaoqing et al., "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems. 2015.
  3. Liu, Wei et al., "Ssd: Single shot multibox detector," European conference on computer vision. Springer, Cham, 2016.
  4. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556 (2014).
  5. Sandler, Mark et al., "Mobilenetv2: Inverted residuals and linear bottlenecks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
  6. Redmon, Joseph et al., "You only look once: Unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  7. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger," Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  8. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767 (2018).
  9. Howard, Andrew G. et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861 (2017).
  10. Howard, Andrew et al., "Searching for mobilenetv3," arXiv preprint arXiv:1905.02244 (2019).
  11. Zhang, Xiangyu et al., "Shufflenet: An extremely efficient convolutional neural network for mobile devices," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
  12. Ma, Ningning et al., "Shufflenet v2: Practical guidelines for efficient cnn architecture design," Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  13. Iandola, Forrest N. et al., "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size," arXiv preprint arXiv:1602.07360 (2016).
  14. Wu, Bichen et al., "Shift: A zero flop, zero parameter alternative to spatial convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
  15. Yang, Tien-Ju et al., "Netadapt: Platform-aware neural network adaptation for mobile applications," Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  16. LeCun, Yann et al., "Gradient-based learning applied to document recognition," Proceedings of the IEEE 86.11 (1998): 2278-2324. https://doi.org/10.1109/5.726791
  17. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems. 2012.
  18. Jeon, Yunho, and Junmo Kim. "Constructing fast network through deconstruction of convolution," Advances in Neural Information Processing Systems. 2018.
  19. Chen, Weijie et al., "All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification," arXiv preprint arXiv:1903.05285 (2019).
  20. Tan, Mingxing et al., "Mnasnet: Platform-aware neural architecture search for mobile," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
  21. Wu, Bichen et al., "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
  22. He, Yihui et al., "Amc: Automl for model compression and acceleration on mobile devices," Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  23. Wang, Robert J., Xiang Li, and Charles X. Ling. "Pelee: A real-time object detection system on mobile devices," Advances in Neural Information Processing Systems. 2018.
  24. Yang, Yifan et al., "Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas," Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2019.
  25. Nakahara, Hiroki et al., "A lightweight yolov2: A binarized cnn with a parallel support vector regression for an fpga," Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2018.
  26. Alyamkin, Sergei et al., "Low-Power Computer Vision: Status, Challenges, Opportunities," IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2019).
  27. https://rebootingcomputing.ieee.org/lpirc/2018
  28. Caesar, Holger, Jasper Uijlings, and Vittorio Ferrari. "Coco-stuff: Thing and stuff classes in context," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
  29. Kirillov, Alexander et al., "Panoptic segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
  30. Everingham, Mark et al., "The pascal visual object classes (voc) challenge," International journal of computer vision 88.2 (2010): 303-338. https://doi.org/10.1007/s11263-009-0275-4
  31. Lin, Tsung-Yi et al., "Microsoft coco: Common objects in context," European conference on computer vision. Springer, Cham, 2014.
  32. Cordts, Marius et al., "The cityscapes dataset for semantic urban scene understanding," Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  33. Zhou, Bolei et al., "Scene parsing through ade20k dataset," Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  34. Neuhold, Gerhard et al., "The mapillary vistas dataset for semantic understanding of street scenes," Proceedings of the IEEE International Conference on Computer Vision. 2017.
  35. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  36. Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation," Proceedings of the IEEE international conference on computer vision. 2015.
  37. Chen, Liang-Chieh et al., "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint arXiv:1412.7062 (2014).
  38. Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv: 1511.07122 (2015).
  39. Zhao, Hengshuang et al., "Pyramid scene parsing network," Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  40. Chen, Liang-Chieh et al., "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834-848. https://doi.org/10.1109/TPAMI.2017.2699184
  41. Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.
  42. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation," International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
  43. Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
  44. Lin, Guosheng et al., "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  45. Paszke, Adam et al., "Enet: A deep neural network architecture for real-time semantic segmentation," arXiv preprint arXiv:1606.02147 (2016).
  46. Chen, Liang-Chieh et al., "Encoder-decoder with atrous separable convolution for semantic image segmentation," Proceedings of the European conference on computer vision (ECCV). 2018.
  47. He, Kaiming et al., "Mask r-cnn," Proceedings of the IEEE international conference on computer vision. 2017.
  48. Cai, Zhaowei, and Nuno Vasconcelos. "Cascade r-cnn: Delving into high quality object detection," Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  49. Chen, Kai et al., "Hybrid task cascade for instance segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
  50. COCO+Mapillary Joint Recognition Challenge Workshop at ECCV 2018, http://cocodataset.org/workshop/coco-mapillary-eccv-2018.html
  51. Fu, Cheng-Yang, Mykhailo Shvets, and Alexander C. Berg. "RetinaMask: Learning to predict masks improves state-ofthe-art single-shot detection for free," arXiv preprint arXiv:1901.03353 (2019).
  52. Lin, Tsung-Yi et al., "Focal loss for dense object detection," Proceedings of the IEEE international conference on computer vision. 2017.
  53. Chen, Liang-Chieh et al., "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587 (2017).
  54. Apple Core ML Models: DeeplabV3, https://developer.apple.com/machine-learning/models/#image
  55. Mobile Deeplab-V3+ model for Segmentation, https://github.com/nolanliou/mobile-deeplab-v3-plus