DOI QR코드

DOI QR Code

2-Stage Detection and Classification Network for Kiosk User Analysis

디스플레이형 자판기 사용자 분석을 위한 이중 단계 검출 및 분류 망

  • Seo, Ji-Won (Department of Information Convergence Engineering, Pusan National University) ;
  • Kim, Mi-Kyung (Software Education Center, Pusan National University)
  • Received : 2022.04.07
  • Accepted : 2022.04.20
  • Published : 2022.05.31

Abstract

Machine learning techniques using visual data have high usability in fields of industry and service such as scene recognition, fault detection, security and user analysis. Among these, user analysis through the videos from CCTV is one of the practical way of using vision data. Also, many studies about lightweight artificial neural network have been published to increase high usability for mobile and embedded environment so far. In this study, we propose the network combining the object detection and classification for mobile graphic processing unit. This network detects pedestrian and face, classifies age and gender from detected face. Proposed network is constructed based on MobileNet, YOLOv2 and skip connection. Both detection and classification models are trained individually and combined as 2-stage structure. Also, attention mechanism is used to improve detection and classification ability. Nvidia Jetson Nano is used to run and evaluate the proposed system.

시각 정보를 이용한 기계 학습 기술은 주변 상황 인지, 결함 감지, 보안 그리고 사용자 분석과 같이 산업, 서비스 분야에서 활용성이 높아졌다. 그 중 CCTV 영상 분석을 통한 사용자 분석은 시각 정보를 잘 활용하는 실용적인 부분이라고 할 수 있다. 또한 이러한 임베디드 환경에서의 실용성을 높이기 위한 신경 회로망 경량화에 대한 연구가 지속되고 있다. 본 논문에서는 디스플레이형 자판기인 키오스크에서 활용할 수 있는 사람 및 얼굴 검출과 사용자의 나이 및 성별 분류 시스템을 제안한다. 제안하는 모델은 MobileNet, YOLOv2, 생략 연결을 기반으로 설계되었으며, 검출과 분류 망을 개별적으로 학습한 뒤 결합한 2-stage 구조를 띈다. 또한 주의 집중 기법을 사용하여 시스템의 성능을 향상시키고자 하였다. 제안하는 시스템에 대한 구동과 성능 평가는 소형 그래픽 처리 유닛인 Nvidia Jetson Nano에서 진행하였다.

Keywords

Acknowledgement

This work was supported by a 2-Year Research Grant of Pusan National University.

References

  1. A. G. Howard, M. Zhu, B, Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "MobileNets: Efficient Convolutional Neural Networks for Mobile VisionApplications," arXiv preprint arXiv:1704.04861, Apr. 2017.
  2. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. -C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City: UT, USA, pp. 4510-4520, 2018.
  3. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: NV, USA, pp. 779-788, 2016.
  4. S. Ren, K. He, R. Girchick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems 28, Montreal, Canada, vol. 1, 2015.
  5. B. Zhu, P. Hofstee, J. Lee, and Z. AI-Ars, "An Attention Module for Convolutional Neural Networks," arXiv preprint arXiv:2108.08205, 2021.
  6. H. Rezatofighi, N. Tsoi, J. Y. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach: CA, USA, pp. 658-666, 2019.
  7. T. -Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2999-3007, 2017.
  8. J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: HI, USA, pp. 6517-6525, 2017.
  9. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: NV, USA, pp. 770-778, 2016.
  10. S. Woo, J. Park, J. -Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision, Munich, Germany, pp. 3-19, 2018.
  11. T. -Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollar, "Microsoft COCO: Common Objects in Context," arXiv preprint arXiv:1405.0312, Feb. 2015.
  12. S. Yang, P. Luo, C. C. Loy, and X. Tang, "WIDER FACE: A Face Detection Benchmark," arXiv preprint arXiv:1511.06523, 2015.
  13. Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua, "Ordinal Regression With Multiple Output CNN for Age Estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: NV, USA, pp. 4920-4928, 2016.