영상인식 및 분류용 인공지능 가속기의 최신 성능평가: MLPerf를 중심으로

  • Published : 2020.01.30

Abstract

인공지능의 고속화를 위한 인공지능용 혹은 딥러닝용 하드웨어 및 소프트웨어 시스템에 대한 수요가 폭발적으로 증가하고 있다. 또한 딥러닝 모델에 따라 다양한 추론 시스템이 끊임없이 연구되고 소개되고 있다. 최근에는 전세계에서 100개가 넘는 회사들에서 인공지능용 추론 칩을 개발하고 있고, 임베디드 시스템에서 데이터센터 솔루션에 이르기까지 다양한 분야를 위한 것들이 존재한다. 이러한 하드웨어의 개발을 위해서 12개 이상의 소프트웨어 프레임 워크 및 라이브러리가 활용되고 있다. 하드웨어와 소프트웨어가 다양한 만큼 이들을 중립적으로 평가하기가 매우 어려운 실정이다. 따라서 업계 표준의 인공지능을 위한 벤치마킹 및 평가기준이 필요한데, 이러한 요구로 인해 MLPerf 추론이 만들어졌다. MLPerf는 30개 이상의 기업과 200개 이상의 머신러닝 연구자 및 실무자들에 의해 운영되고, 전혀 다른 구조를 갖는 시스템을 비교할 수 있는 일관성 있는 규칙과 방법을 제시한다. MLPerf에 의해 제시된 규칙에 의해 2019년도에 처음으로 다양한 인공지능용 추론 하드웨어가 벤치마킹을 수행했다. 여기에는 14개의 회사에서 600개 이상의 추론 결과를 측정하였으며, 30개가 넘는 시스템이 이러한 추론에 사용되었다. 본 원고에서는 MLPerf의 학습과 추론을 중심으로 하여 최근에 개발된 다양한 회사들의 인공지능용 하드웨어, 즉 가속기 들의 성능을 살펴보고자 한다.

Keywords

References

  1. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn-ing for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
  2. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.
  3. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Ssd: Single shot multibox detector. In European conference on computer vision, pp. 21-37. Springer, 2016.
  4. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097-1105, 2012.
  5. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Badrinarayanan, V., Kendall, A., and Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481-2495, 2017. https://doi.org/10.1109/TPAMI.2016.2644615
  7. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. 2017.
  8. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, volume 16, pp. 265-283, 2016.
  9. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., and Zhang, Z. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.
  10. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. In ACM International Conference on Multimedia, pp. 675-678. ACM, 2014.
  11. Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual Inter-national Symposium on Computer Architecture (ISCA), pp. 1-12. IEEE, 2017.
  12. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., et al. fTVMg: An automated end-to-end optimizing compiler for deep learning. In 13th fUSENIXg Symposium on Operating Systems Design and Implementation (fOSDIg 18), pp. 578-594, 2018.
  13. Markidis, S., Der Chien, S. W., Laure, E., Peng, I. B., and Vetter, J. S. Nvidia tensor core programmability, performance & precision. arXiv preprint arXiv:1803.04014, 2018.
  14. Intel. Bigdl: Distributed deep learning library for apache spark, 2019. URL https://github.com/ intel-analytics/BigDL.
  15. Hennessy, J. L. and Patterson, D. A. Computer architecture: a quantitative approach. Elsevier, 2011.
  16. Council, T. P. P. Transaction processing performance council. Web Site, http://www.tpc.org, 2005.
  17. Han, S., Mao, H., and Dally, W. J. Deep compres-sion: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  18. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J. Eie: efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Ar-chitecture (ISCA), pp. 243-254. IEEE, 2016.
  19. Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.
  20. Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
  21. Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., and Brooks, D. Fathom: Reference Workloads for Modern Deep Learning Methods. In Workload Characterization (IISWC), 2016 IEEE International Symposium on, pp. 1-10. IEEE, 2016.
  22. Coleman, C., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., and Zaharia, M. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS 머신러닝 Systems Workshop, 2017.
  23. EEMBC. Introducing the eembc 머신러닝mark benchmark.
  24. Zhu, H., Akrout, M., Zheng, B., Pelegris, A., Jayarajan, A., Phanishayee, A., Schroeder, B., and Pekhimenko, G. Benchmarking and analyzing deep neural network training. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pp. 88-100. IEEE, 2018.
  25. Alibaba. Ai matrix. https://aimatrix.ai/ en-us/, 2018.
  26. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255. Ieee, 2009.
  27. MLPerf. MLPerf Reference: ResNet in TensorFlow. https://github.com/MLPerf/training/tree/master/image_classification/tensorflow/official, 2019
  28. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-manan, D., Dollar, P., and Zitnick, C. L. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, pp. 740-755. Springer, 2014.
  29. WMT. First conference on machine translation, 2016. URL http://www.statmt.org/wmt16/.
  30. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
  31. WMT. Second conference on machine translation, 2017.
  32. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp. 5998-6008, 2017.
  33. GroupLens. Movielens 20m dataset, Oct 2016. URL https://grouplens.org/datasets/ movielens/20m/.
  34. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pp. 173-182. International World Wide Web Conferences Steering Committee, 2017b.
  35. MLPerf. MLPerf Reference: MiniGo. https://github.com/MLPerf/training/tree/master/reinforcement, 2019a.
  36. Mattson, P., Cheng, C., Coleman, C., Diamos, G., Micikevicius, P., Patterson, D., Tang, H., Wei, G.-Y., Bailis, P., Bittorf, V., Brooks, D., Chen, D., Dutta, D., Gupta, U., Hazelwood, K., Hock, A., Huang, X., Jia, B., Kang, D., Kanter, D., Kumar, N., Liao, J., Narayanan, D., Oguntebi, T., Pekhimenko, G., Pentecost, L., Reddi, V. J., Robie, T., John, T. S., Wu, C.-J., Xu, L., Young, C., and Zaharia, M. MLPerf training benchmark, 2019.
  37. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  38. Bai, J., Lu, F., Zhang, K., et al. Onnx: Open neural network exchange. https://github.com/onnx/onnx, 2019.
  39. "MLPerf Training Benchmark", https://arxiv.org/abs/1910.01500
  40. "MLPerf Inference Benchmark", https://arxiv.org/abs/1911.02549