DOI QR코드

DOI QR Code

A review and comparison of convolution neural network models under a unified framework

  • Received : 2021.08.11
  • Accepted : 2021.12.08
  • Published : 2022.03.31

Abstract

There has been active research in image classification using deep learning convolutional neural network (CNN) models. ImageNet large-scale visual recognition challenge (ILSVRC) (2010-2017) was one of the most important competitions that boosted the development of efficient deep learning algorithms. This paper introduces and compares six monumental models that achieved high prediction accuracy in ILSVRC. First, we provide a review of the models to illustrate their unique structure and characteristics of the models. We then compare those models under a unified framework. For this reason, additional devices that are not crucial to the structure are excluded. Four popular data sets with different characteristics are then considered to measure the prediction accuracy. By investigating the characteristics of the data sets and the models being compared, we provide some insight into the architectural features of the models.

Keywords

Acknowledgement

Yoonsuh Jung's work was partially supported by National Research Foundation of Korea (NRF) grant funded by Korea government (MIST)(2019R1A4A1028134 and 2021R1F1A1062347).

References

  1. Chollet F (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1251-1258.
  2. Deng J, Dong W, Socher R, Li LJ, Li K, and Fei-Fei L (2009). ImageNet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.
  3. He K, Zhang X, Ren S, and Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
  4. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, and Adam H (2017). Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv preprint arXiv:1704.04861
  5. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, and Keutzer K (2016). SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size, arXiv preprint arXiv:1602.07360
  6. Ioffe S and Szegedy C (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In JMLR Workshop and Conference Proceedings, 37, 448-456.
  7. Krizhevsky A, Nair V, and Hinton G (2014). The cifar-10 dataset, http://www.cs.toronto.edu/kriz/cifar
  8. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, and Jackel LD (1989). Backpropagation applied to handwritten zip code recognition, Neural computation, 1, 541-551. https://doi.org/10.1162/neco.1989.1.4.541
  9. Nair V and Hinton GE (2010). Rectified linear units improve restricted boltzmann machines. In ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning.
  10. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, and Ng AY (2011). Reading digits in natural images with unsupervised feature learning. In Advances in Neural Information Processing Systems (NIPS).
  11. Mukkamala MC and Hein M (2017). Variants of RMSP rop and a dagrad with logarithmic regret bounds. In Proceedings of the 34th International Conference on Machine Learning, 70, 2545-2553.
  12. Scherer D, Muller A, and Behnke S (2010). Evaluation of pooling operations in convolutional architectures for object recognition, International Conference on Artificial Neural Networks, 92-101.
  13. Simonyan K and Zisserman A (2014). Very deep convolutional networks for large-scale image recognition, Computer Vision and Pattern Recognition, arXiv:1409.1556
  14. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, and Salakhutdinov R (2014). Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, 15, 1929-1958.
  15. Szegedy C, Liu W, Jia Y, et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9.
  16. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, andWojna Z (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818-2826.
  17. Wei W, Yiyang H, Ting Z, Hongmei L, Jin W, and Xin W (2020). A new image classification approach via improved mobilenet models with local receptive field expansion in shallow layers, Computational Intelligence and Neuroscience.
  18. Xiao H, Rasul K, and Vollgraf R (2017). Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms, arXiv:1708.07747
  19. Zagoruyko S and Komodakis N (2016). Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 87, 12.
  20. Zhang X, Zhou X, Lin M, and Sun J (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6848-6856.
  21. Zoph B and Le VQ (2016). Neural architecture search with reinforcement learning, In CoRR.
  22. Zoph B, Vasudevan V, Shlens J, and Le QV (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8697-8710.