DOI QR코드

DOI QR Code

Dynamic Core Affinity for High-Performance I/O Devices Supporting Multiple Queues

다중 큐를 지원하는 고속 I/O 장치를 위한 동적 코어 친화도

  • 조중연 (건국대학교 컴퓨터공학부) ;
  • 엄준용 (건국대학교 컴퓨터공학부) ;
  • 진현욱 (건국대학교 컴퓨터공학부) ;
  • 정성인 (한국전자통신연구원 SW기초연구센터)
  • Received : 2016.02.19
  • Accepted : 2016.04.18
  • Published : 2016.07.15

Abstract

Several studies have reported the impact of core affinity on the network I/O performance of multi-core systems. As the network bandwidth increases significantly, it becomes more important to determine the effective core affinity. Although a framework for dynamic core affinity that considers both network and disk I/O has been suggested, the multiple queues provided by high-speed I/O devices are not properly supported. In this paper, we extend the existing framework of dynamic core affinity to efficiently support the multiple queues of high-speed I/O devices, such as 40 Gigabit Ethernet and NVM Express. Our experimental results show that the extended framework can improve the HDFS file upload throughput by up to 32%, and can provide improved scalability in terms of the number of cores. In addition, we analyze the impact of the assignment policy of multiple I/O queues across a number of cores.

멀티코어 시스템에서 코어 친화도가 네트워크 I/O 성능에 미치는 영향은 다양한 연구들을 통해 관찰 되었다. 점차 고속화되는 네트워크 연결의 발전에 따라 효율적인 코어 친화도 정책은 중요한 성능 요소가 될 수 있다. 미들웨어 수준의 동적 코어 친화도 프레임워크는 네트워크와 디스크 I/O를 함께 고려한 코어 친화도 정책을 제안하였지만 다중 큐에 대한 고려는 이루어지지 않았다. 본 논문에서는 기존 동적 코어 친화도 프레임워크에 사용된 알고리즘을 다중 큐를 지원하기 위한 구조로 확장하고, 40 기가비트 이더넷과 NVMe 디바이스를 장착한 시스템에서 파일 업로드 성능을 분석한다. 실험 결과 다중 큐를 지원하기 위한 동적 코어 친화도는 하둡 분산 파일 시스템의 파일 업로드 처리량을 최대 32% 향상시켰으며 매니코어 시스템에서 더 나은 확장성을 제공할 수 있음을 확인하였다. 또한 다중 큐 조합에 따른 성능 영향에 대해 분석하여 다중 큐 분배 조합을 위해 고려해야 할 성능 요소들에 대해 논의한다.

Keywords

Acknowledgement

Grant : 매니코어 기반 초고성능 스케일러블 OS 기초연구

Supported by : 정보통신기술진흥센터

References

  1. A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris, "Improving network connection locality on multicore systems," Proc. of the 7th ACM european conference on Computer Systems, pp. 337-350, 2012.
  2. V. Ahuja, M. Farrens, and D. Ghosal, "Cache-aware affinitization on commodity multicores for highspeed network flows," Proc. of the 10th International Conference on Applied Cryptography and Network Security, pp. 39-48, 2012.
  3. W. Wu, P. DeMar, and M. Crawford, "A transportfriendly NIC for multicore/multiprocessor systems," IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 4, pp. 607-615, 2012. https://doi.org/10.1109/TPDS.2011.195
  4. H.-C. Jang and H.-W. Jin, "MiAMI: Multi-core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces," Proc. of the 17th IEEE Symposium on High Performance Interconnects, pp. 73-82, 2009.
  5. N. Hanford, V. Ahuja, M. Balman, M. K. Farrens, D. Ghosal, E. Pouyoul, and B. Tierney, "Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows," Proc. of the 3rd IEEE/ACM International Workshop on Network-aware Data Management, pp. 1:1-1:10, 2013.
  6. J.-Y. Cho and H.-W. Jin, "An Optimization Tool for Determining Processor Affinity of Networking Processes," KIPS Transactions on Software and Data Engineering, Vol. 2, No. 2, pp. 131-136, 2013. (in Korean) https://doi.org/10.3745/KTSDE.2013.2.2.131
  7. Microsoft WinHEC. (2004, April 14). Scalable Networking: Eliminating the Receive Processing Bottleneck-Introducing RSS. [Online]. Available: http://download.microsoft.com/download/5/d/6/5d6eaf2b-7ddf-476b-93dc-7cf0072878e6/ndis_rss.doc (downloaded 2016, Feb. 18).
  8. J.-Y. Cho, H.-W. Jin, M. Lee, and K. Schwan, "Dynamic core affinity for high-performance file upload on Hadoop Distributed File System," Parallel Computing, Vol. 40, No. 10, pp. 722-737, 2014. https://doi.org/10.1016/j.parco.2014.07.005
  9. NVM Express, Inc. (2014, Nov 3). NVM Express (Revision 1.2) [Online]. Available: http://nvmexpress.org/wp-content/uploads/NVM_Express_1_2_Gold_20141209.pdf (downloaded 2016, Feb. 18).
  10. USNA. TTCP: a test of TCP and UDP performance, Dec. 1984.
  11. A. Ortiz, J. Ortega, A. F. Diaz, and A. Prieto, "Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures," Journal of Computer Science and Technology, Vol. 28, No. 3, pp. 508-524, 2013. https://doi.org/10.1007/s11390-013-1352-2
  12. J.-Y. Cho and H.-W. Jin, "Performance Analysis of Dynamic Core Affinity Framework for HDFS over High-Performance I/O Devices," Proc. of the 42nd KIISE Winter Conference, pp. 1222-1224, 2015. (in Korean)
  13. S.-H. Kang, D.-H. Koo, W.-H. Kang, and S.-W. Lee, "A Case for Flash Memory SSD in Hadoop Applications," International Journal of Control and Automation, Vol. 6, No. 1, Feb. 2013.
  14. S. Joshi and V. Liaskovitis. (2010, Oct 11). Java Garbage Collection Characteristics and Tuning Guidelines for Apache Hadoop TeraSort Workload [Online]. Available: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/GarbageCollectionTuningf orHadoopTeraSort1.pdf (downloaded 2016, Feb. 18)
  15. L. Shi, Z. Wang, W. Yu, and X. Meng, "Performance Evaluation and Tuning of BioPig for Genomic Analysis," Proc. of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, pp. 9-15, Nov. 2015.
  16. J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, S. Rumble, R. Stutsman, and S. Yang, "The RAMCloud storage system," ACM Transactions on Computer Systems, Vol. 33, No. 3, Article No. 7, 2015.
  17. T. David, R. Guerraoui and V. Trigonakis, "Everything You Always Wanted to Know about Synchronization but Were Afraid to Ask," Proc. of the 24th ACM Symposium on Operating Systems Principles, pp. 33.48, 2013.
  18. S. Srikanthan, S. Dwarkadas, and K. Shen, "Data Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems," Proc. of the USENIX Annual Technical Conference, pp. 529-540, 2015.