Acknowledgement
이 논문은 2018년도 정부(교육부)의 재원으로 한국연구재단 기초연구사업(2018R1D1A1B07043858)과 과학기술정보통신부/정보통신기획평가원의 대학ICT연구센터지원사업(IITP-2021-2018-0-01431)의 지원을 받아 연구되었음.
References
- S. Teerapittayanon, B. McDanel and H. T. Kung, "Distributed deep neural networks over the cloud, the edge and end devices," in IEEE International Conference on Distributed Computing Systems, Atlanta, pp.328-339, 2017.
- X. W. Chen and X. Lin, "Big data deep learning: Challenges and perspectives," IEEE Access, Vol. 2, pp.514-524, 2014. https://doi.org/10.1109/ACCESS.2014.2325029
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, Vol.521, pp.436-444, 2015. https://doi.org/10.1038/nature14539
- M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, "Deep learning applications and challenges in big data analytics," Journal of Big Data, Vol.2, No.1, 2015.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, and M. Kudlur, "Tensorflow: A system for large-scale machine learning," in USENIX Symposium on Operating Systems Design and Implementation, Savannah, pp.265-283, 2016.
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the ACM International Conference on Multimedia, pp.675-678, 2014.
- R. Collobert, K. Kavukcuoglu and C. Farabet, "Torch7: A matlab-like environment for machine learning," in BigLearn, NIPS Workshop, No. CONF, 2011.
- F. Seide and A. Agarwal, "CNTK: Microsoft's open-source deep-learning toolkit," in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.2135-2135, 2016.
- NVIDIA Developer, NCCL, [Internet] https://developer.nvidia.com/nccl.
- The Software in the Public Interest non-profit organization, Open MPI, [Internet] https://www.open-mpi.org/.
- Baidu Research, Ring all-reduce, [Internet] https://github.com/baidu-research/baidu-allreduce.
- R. Thakur, R. Rabenseifner, and, W. Gropp, "Optimization of collective communication operations in MPICH," The International Journal of High Performance Computing Applications, Vol.19, No.1, pp.49-66, 2005. https://doi.org/10.1177/1094342005051521
- X. Jia, S. Song, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu, G. Hu, S. Shi, X. Chu, and T. Chen, "Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes," arXiv preprint arXiv:1807.11205, 2018.
- H. Mikami, H. Suganuma, Y. Tanaka, and Y. Kageyama, "ImageNet/ResNet-50 Training in 224 Seconds," arXiv preprint arXiv:1811.05233, 2018.
- A. Sergeev and M. Del Balso, "Horovod: Fast and easy distributed deep learning in TensorFlow," arXiv preprint arXiv:1802.05799, 2018.
- Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, "Deep gradient compression: Reducing the communication bandwidth for distributed training," arXiv preprint arXiv: 1712.01887, 2017.
- P. Sun, Y. Wen, R. Han, W. Feng, and S. Yan, "GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training," IEEE Transactions on Big Data, 2019.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp.770-778, 2016.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, Vol.60, No.6, pp.84-90, 2017. https://doi.org/10.1145/3065386
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp1-9, 2015.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp.4700-4708, 2017.
- J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
- Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," in Advances in Neural Information Processing Systems, pp.5753-5763, 2019.