- Volume 8 Issue 2
Task-based programming is becoming the state-of-the-art method of choice for extracting the desired performance from multi-core chips. It expresses a program in terms of lightweight logical tasks rather than heavyweight threads. Intel Threading Building Blocks (TBB) is a task-based parallel programming paradigm for multi-core processors. The performance gain of this paradigm depends to a great extent on the efficiency of its parallel constructs. The parallel overheads incurred by parallel constructs determine the ability for creating large-scale parallel programs, especially in the case of fine-grain parallelism. This paper presents a study of TBB parallelization overheads. For this purpose, a TBB micro-benchmarks suite called TBBench has been developed. We use TBBench to evaluate the parallelization overheads of TBB on different multi-core machines and different compilers. We report in detail in this paper on the relative overheads and analyze the running results.
- A. Aiken et al., "Towards Pervasive Parallelism". Presentation of Pervasive Parallelism Laboratory Stanford University, http://ppl.stanford.edu/wiki/index.php/Pervasive_Parallelism_Laboratory.
- K. Asanovic et al., "The landscape of parallel computing research: A view from Berkeley". University of California at Berkeley, Technical Report No. UCB/EECS-2006-183, December, 18, 2006. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
- M. Bull, "Measuring Synchronization and Scheduling Overheads in OpenMP", Proceeding of First European Workshop on OpenMP (EWOMP '99) Lund, Sweden, October, 1999.
- M. Bull and D. O'Neill, "micro-benchmark Suite for OpenMP 2.0", Proceedings of the Third European Workshop on OpenMP (EWOMP'01), Barcelona, Spain, September, 2001, pp.41-48.
- G. Contreras and M. Martonosi, "Characterizing and Improving the Performance of Intel Threading Building Blocks", IEEE Proceeding of International Symposium on Workload Characterization, 2008, pp.57-66.
- K. Fuerlinger and M. Gerndt, "ompP: A profiling tool for OpenMP", In Proceedings of the First International Workshop on OpenMP (IWOMP 2005), Eugene, Oregon, USA, May, 2005.
- K. Fuerlinger, "The OpenMP Profiler ompP: User Guide and Manual", May, 2008. http://www.cs.utk.edu/karl/research/ompp/usage.html
- K. Fuerlinger and D. Skinner, "Performance Profiling for OpenMP Tasks", In Proceedings of the 5th International Workshop on OpenMP (IWOMP 2009). Dresden, Germany, June, 2009.
- D. Hower and S. Jackson, "TaskMan: Simple Task-Parallel Programming", http://pages.cs.wisc.edu/david/courses/cs758/Fall2009/includes/Projects/JacksonHower-slides.pdf
- B. Nicols et al., "Pthreads Programming, A POSIX Standard for Better Multiprocessing", O'reilly, September 1996.
- A. Marowka, "Parallel Computing on Any Desktop", Communication of ACM, Vol.50, Issue 9, September, 2007, pp.74-78.
- A. Marowka, "Execution Model of Three Parallel Languages: OpenMP, UPC and CAF". Scientific Programming, Vol.13(2), October, 2005, pp.127-135. https://doi.org/10.1155/2005/914081
- A. Marowka, "Performance of OpenMP Benchmarks on Multi-core Processors", 8th International Conference on Algorithms and Architectures for Parallel Processing(ICA3PP), Agia Napa, Cyprus, June, 9-11, 2008, LNCS proceeding Vol.5022, pp.208-219.
- A. Marowka, "Pitfalls and Issues of Manycore Programming", ADVANCES IN COMPUTERS, Volume 79, 2010, Elsevier.
- A. Marowka, "Back to Thin-Core Massively Parallel Processors", IEEE Computer, Vol.44, No.12, December, 2011, pp.49-54.
- A. Marowka, "On Performance Analysis of a Multithreaded Application Parallelized by Different Programming Models using Intel VTune", Malyshkin, V. (ed.) Eleventh International Conference on Parallel Computing Technologies (PaCT). LNCS 6873, Springer (2011), pp.317-331.
- J. Reinders, "Intel Threading Building Blocks, Outfitting C++ for Multi-core Processor Parallelism", O'Reilly, 2007.
- P. Kegel, M. Schellmann, S. Gorlatch, S. (2009): "Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-Cores". In Sips, H. J., Epema, D. H. J., Lin H. (Hrsg.): Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August, 25-28, 2009, Seiten 654-665.
- A. Podobas, M. Brorsson, and K. Faxan, "A Comparison of some recent Task-based Parallel Programming Models", in the proceeding of the Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG), Pisa, Italy, January, 24, 2010.
- A. Robison, M. Voss and A. Kukanov, "Optimization via Reflection on Work Stealing in TBB", In Proceeding of IEEE International Symposium on Parallel and Distributed Processing, IPDPS, 2008, pp.1-8.
- H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software". Dr. Dobb's Journal, 30(3), March, 2005.
- H. Sutter and J. Larus, "Software and the concurrency revolution". ACM Queue 3, 7 (September, 2005), 54-62.
- L. Wang and X. Xu, "Parallel Software Development with Intel Threading Analysis Tools", Intel Technology Journal, Vol.11, Issue 04, 2007, pp.287-297.
- "High Productivity Computing Systems", http://www.highproductivity.org/
- "Intel Parallel Studio", http://www.intel.com/cd/software/products/asmo-na/eng/399359.htm
- "Sphinx Micro-benchmark Suite", http://www.llnl.gov/CASC/RTSReport/sphinx.html
- TBB Web Site: http://www.threadingbuildingblocks.org/
- UPCRC: http://www.upcrc.illinois.edu/index.html
- More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms vol.50, pp.8, 2015, https://doi.org/10.1145/2858788.2688501
- Implications of shallower memory controller transaction queues in scalable memory systems vol.72, pp.5, 2016, https://doi.org/10.1007/s11227-015-1485-x
- NTB branch predictor: dynamic branch predictor for high-performance embedded processors vol.72, pp.5, 2016, https://doi.org/10.1007/s11227-014-1280-0
- Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus vol.74, pp.4, 2018, https://doi.org/10.1007/s11227-017-2231-3