Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (1) : 73-85    https://doi.org/10.1007/s11704-016-6286-8
RESEARCH ARTICLE
FunctionFlow: coordinating parallel tasks
Xuepeng FAN, Xiaofei LIAO, Hai JIN()
Services Computing Technology and System Lab (SCTS) & Cluster and Grid Computing Lab (CGCL), School of Computer Science and Technology, Huazhong University of Science and Technology,Wuhan 430074, China
 Download: PDF(504 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks — the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks’ dependencies.

In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks’ termination. The key idea behind dependency expression is to use && for both task’s termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run.

Finally, we demonstrate FunctionFlow’s effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.

Keywords task parallel programming      tasks dependency      FunctionFlow      coordination patterns     
Corresponding Author(s): Hai JIN   
Just Accepted Date: 19 October 2016   Online First Date: 09 May 2018    Issue Date: 31 January 2019
 Cite this article:   
Xuepeng FAN,Xiaofei LIAO,Hai JIN. FunctionFlow: coordinating parallel tasks[J]. Front. Comput. Sci., 2019, 13(1): 73-85.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6286-8
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I1/73
1 JReinders. Intel Threading Building Blocks: Outfitting C++ forMulticore Processor Parallelism. Sebastopol, CA: O’Reilly Media, Inc., 2007
2 DLeijen, W Schulte, SBurckhardt. The design of a task parallel library. In: Proceedings of ACM Annual Conference on Object Oriented Programming Systems, Languages, and Applications. 2009, 227–242
https://doi.org/10.1145/1640089.1640106
3 PKambadur, AGupta, AGhoting, H Avron, ALumsdaine. PFunc: modern task parallelism for modern high performance computing. In: Proceedings of ACM Conference on High Performance Computing Networking, Storage and Analysis. 2009, 1–11
https://doi.org/10.1145/1654059.1654103
4 MFrigo, C E Leiserson, K HRandall.The implementation of the Cilk- 5 multithreaded language. ACM SIGPLAN Notices, 1998, 33(5): 212–223
https://doi.org/10.1145/277652.277725
5 LDagum, RMenon. OpenMP: an industry standard API for sharedmemory programming. IEEE Computational Science and Engineering, 2002, 5(1): 46–55
https://doi.org/10.1109/99.660313
6 V,Saraswat VSarkar, Cvon Praun. X10: concurrent programming for modern architectures. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2007, 271
https://doi.org/10.1145/1229428.1229483
7 B LChamberlain, D Callahan, H PZima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 2013, 21(3): 291–312
https://doi.org/10.1177/1094342007078442
8 SImam, VSarkar. Cooperative scheduling of parallel tasks with general synchronization patterns. In: Proceedings of the 24th European Conference on Object-Oriented Programming. 2014, 618–643
https://doi.org/10.1007/978-3-662-44202-9_25
9 YSaad. Iterative Methods for Sparse Linear Systems. 2nd ed. Philadelphia: SIAM, 2003
https://doi.org/10.1137/1.9780898718003
10 AAlexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison Wesley, 2001
11 H KPyla, C Ribbens, SVaradarajan. Exploiting coarse-grain speculative parallelism. ACM SIGPLAN Notices, 2011, 46(10): 555–574
https://doi.org/10.1145/2076021.2048110
12 I HKazi, D JLilja. Coarse-grained thread pipelining: a speculative parallel execution model for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 2001, 12(9): 952–966
https://doi.org/10.1109/71.954629
13 SLi, CHu, JZhang, Y Zhang. Automatic tuning of sparse matrixvector multiplication on multi core clusters. Science in China Series F: Information Sciences, 2015, 58(9): 1–14
14 FZhang, XQiao, ZLiu. Parallel divide and conquer bio-sequence comparison based on smith-waterman algorithm. Science in China Series F: Information Sciences, 2004, 47(2): 221–231
https://doi.org/10.1360/02yf0362
15 C CChi, B Juurlink, CMeenderinck. Evaluation of parallel H.264 decoding strategies for the cell broadband engine. In: Proceedings of the 24th ACM International Conference on Supercomputing. 2010, 105–114
https://doi.org/10.1145/1810085.1810102
16 JSubhlok, J M Stichnoth, D RO'hallaron, TGross. Exploiting task and data parallelism on a multicomputer. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 13–22
https://doi.org/10.1145/155332.155334
17 DChase, YLev. Dynamic circular work-stealing deque. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 2005, 21–28
https://doi.org/10.1145/1073970.1073974
18 DDechev, P Pirkelbauer, BStroustrup. Understanding and effectively preventing the ABA problem in descriptor-based lock-free designs. In: Proceedings of the 13th IEEE International Symposiumon Object/Component/Service-Oriented Real-Time Distributed Computing. 2010, 185–192
https://doi.org/10.1109/ISORC.2010.10
19 MHerlihy. Wait-free synchronization. ACMTransactions on Programming Languages and Systems, 1991, 13(1): 124–149
https://doi.org/10.1145/114005.102808
20 CBienia, KLi. PARSEC 2.0: a new benchmark suite for chipmultiprocessors. In: Proceedings of the 5th AnnualWorkshop on Modeling Benchmarking and Simulation. 2009
21 S CWoo, MOhara, ETorrie, J P Singh, AGupta. The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd ACM Annual International Symposium on Computer Architecture. 1995, 24–36
https://doi.org/10.1145/223982.223990
22 BBahmani, B Moseley, AVattani, RKumar, S Vassilvitskii. Scalable k-means++. Very Large Data Bases Endowment, 2012, 5(7): 622–633
https://doi.org/10.14778/2180912.2180915
23 YLuo, R Duraiswami. Canny edge detection on NVIDIA CUDA. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2008, 1–8
24 CSpatz. Basic Statistics: Tales of Distributions. Belmont: Wads worth Cengage Learning, 1981
25 JZhou, BDemsky. Bamboo: a data-centric, object-oriented approach to many-core software. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation. 2010, 388–399
https://doi.org/10.1145/1806596.1806640
26 GTzenakis, A Papatriantafyllou, HVandierendonck, PPratikakis, D S Nikolopoulos. BDDT: blocklevel dynamic dependence analysis for task-based parallelism. Lecture Notes in Computer Science, 2013, 8299: 17–31
https://doi.org/10.1007/978-3-642-45293-2_2
27 M SLam, M CRinard. Coarse-grain parallel programming in Jade. In: Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1991, 94–105
https://doi.org/10.1145/109625.109636
28 J MPerez, R MBadia, JLabarta. A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the 9th IEEE International Conference on Cluster Computing. 2008, 142–151
https://doi.org/10.1109/CLUSTR.2008.4663765
29 S SChatterjee, R Gururaj. Lazy-parallel function calls for automatic parallelization. In: Proceedings of the 1st International Conference on Computational Intelligence and Information Technology. 2011, 811–816
https://doi.org/10.1007/978-3-642-25734-6_145
30 MAldinucci, M Danelutto, PKilpatrick, MTorquati. Fastflow: highlevel and efficient streaming on multi-core. In: Pllana S, Xhafa F, eds. Programming Multi-core and Many-core Computing Systems. Parallel and Distributed Computing, Chapter 13. Wiley, 2014
31 STasirlar, VSarkar. Data-driven tasks and their implementation. In: Proceedings of the 40th IEEE International Conference on Parallel Processing. 2011, 652–661
https://doi.org/10.1109/ICPP.2011.87
32 XFan, HJin, LZhu, X Liao, CYe, XTu. Function flow: making synchronization easier in task parallelism. In: Proceedings of the 2012 ACM International Workshop on Programming Models and Applications for Multicores and Manycores. 2012, 74–82
https://doi.org/10.1145/2141702.2141711
33 Y KKwok, IAhmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 1999, 31(4): 406–471
https://doi.org/10.1145/344588.344618
34 YGuo, RBarik, RRaman, V Sarkar. Work-first and help-first scheduling policies fora sync-finish task parallelism. In: Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing Symposium. 2009, 1–12
35 OTardieu, HWang, HLin. A work-stealing scheduler for X10’s task parallelism with suspension. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2012, 267–276
https://doi.org/10.1145/2145816.2145850
36 YXia, V K Prasanna, JLi. Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping. In: Proceedings of the 15thWorkshop on Job Scheduling Strategies for Parallel Processing. 2010, 154–174
https://doi.org/10.1007/978-3-642-16505-4_9
37 IAhmad, Y KKwok, M YWu. Analysis, evaluation, and comparison of algorithms for scheduling task graphs onparallel processors. In: Proceedings of the 2nd IEEE International Symposium on Parallel Architectures, Algorithms, and Networks. 1996, 207–213
38 SAgarwal, RBarik, DBonachea, V Sarkar, R KShyamasundar, KYelick. Deadlock-free scheduling of X10 computations with bounded resources. In: Proceedings of the 19th Annual ACM symposium on Parallel Algorithms and Architectures. 2007, 229–240
https://doi.org/10.1145/1248377.1248416
39 KAgrawal, C E Leiserson, JSukha. Executing task graphs using workstealing. In: Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing Symposium. 2010, 1–12
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed