DOI QR코드

DOI QR Code

Single Shot Detector for Detecting Clickable Object in Mobile Device Screen

모바일 디바이스 화면의 클릭 가능한 객체 탐지를 위한 싱글 샷 디텍터

  • Received : 2021.06.28
  • Accepted : 2021.08.11
  • Published : 2022.01.31

Abstract

We propose a novel network architecture and build dataset for recognizing clickable objects on mobile device screens. The data was collected based on clickable objects on the mobile device screen that have numerous resolution, and a total of 24,937 annotation data were subdivided into seven categories: text, edit text, image, button, region, status bar, and navigation bar. We use the Deconvolution Single Shot Detector as a baseline, the backbone network with Squeeze-and-Excitation blocks, the Single Shot Detector layer structure to derive inference results and the Feature pyramid networks structure. Also we efficiently extract features by changing the input resolution of the existing 1:1 ratio of the network to a 1:2 ratio similar to the mobile device screen. As a result of experimenting with the dataset we have built, the mean average precision was improved by up to 101% compared to baseline.

모바일 디바이스 화면상의 클릭 가능한 객체를 인지하기 위한 데이터셋을 구축하고 새로운 네트워크 구조를 제안한다. 모바일 디바이스 화면에서 클릭 가능한 객체를 기준으로 다양한 해상도를 가진 디바이스에서 여러 애플리케이션을 대상으로 데이터를 수집하였다. 총 24,937개의 annotation data를 text, edit text, image, button, region, status bar, navigation bar의 7개 카테고리로 세분화하였다. 해당 데이터셋을 학습하기 위한 모델 구조는 Deconvolution Single Shot Detector를 베이스라인으로, backbone network는 기존 ResNet에 Squeeze-and-Excitation block을 추가한 Squeeze-and-Excitation networks를 사용하고, Single shot detector layers와 Deconvolution module을 Feature pyramid networks 형태로 쌓아 올려 header와 연결한다. 또한, 기존 input resolution의 1:1 비율에서 오는 특징의 손실을 최소화하기 위해 모바일 디바이스 화면과 유사한 1:2 비율로 변경하였다. 해당 모델을 구축한 데이터셋에 대하여 실험한 결과 베이스라인에 대비하여 mean average precision이 최대 101% 개선되었다.

Keywords

Acknowledgement

본 논문은 과학기술정보통신부 및 정보통신산업진흥원의 '고성능 컴퓨팅 지원' 사업으로부터 지원받아 수행하였음.

References

  1. Y. Baek and D. Bae, "Automated model-based android GUI testing using multi-level GUI comparison criteria," in Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp.238-249, 2016.
  2. I. A. Salihu, R. Ibrahim, B. S. Ahmed, K. Z. Zamli, and A. Usman, "AMOGA: A static-dynamic model generation strategy for mobile apps testing," IEEE Access, Vol.7, pp.17158-17173, 2019. https://doi.org/10.1109/access.2019.2895504
  3. A. Usman, N. Ibrahim, and I. A. Salihu, "TEGDroid: Test case generation approach for android apps considering context and GUI events," International Journal on Advanced Science, Engineering and Information Technology, Vol.10, No.1, pp.16-23, 2020. https://doi.org/10.18517/ijaseit.10.1.10194
  4. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp.580-587, 2014.
  5. J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International Journal of Computer Vision, Vol.104, No.2, pp.154-171, 2013. https://doi.org/10.1007/s11263-013-0620-5
  6. R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer Vision, Santiago, pp.1440-1448, 2015.
  7. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," in Proceedings of the Advances in Neural Information Processing Systems, Quebec, pp.91-99, 2015.
  8. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," in Proceedings of the European Conference on Computer Vision, Amsterdam, pp.21-37, 2016.
  9. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6 [cs.CV] 10 Apr. 2015.
  10. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nevada, pp.779-788, 2016.
  11. C.-Y. Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi and Alexander C. Berg, "DSSD: Deconvolutional Single Shot Detector," arXiv:1701.06659v1 [cs.CV] 23 Jan. 2017.
  12. K. He, X. Zhang, S. Ren, and J. Sun, "Deep resudual learning for image recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nevada, pp.770-778, 2016.
  13. J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Utah, pp.7132-7141, 2018.
  14. A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in Proceedings of the European Conference on Computer Vision, Amsterdam, pp.483-499, 2016.
  15. Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, "Compact bilinear pooling," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nevada, pp.317-326, 2016
  16. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, pp.2117-2125, 2017.
  17. M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (VOC) challenge," International Journal of Computer Vision, Vol.88, No.2, pp.303-338, 2010. https://doi.org/10.1007/s11263-009-0275-4