Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (4) : 144102    https://doi.org/10.1007/s11704-019-9020-5
RESEARCH ARTICLE
Efficient FPGA-based graph processing with hybrid pull-push computational model
Chengbo YANG, Long ZHENG(), Chuangyi GUI, Hai JIN
National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology,Wuhan 430074, China
 Download: PDF(592 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs. Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations. Nevertheless, considering the limited onchip resources and streamline pipeline computation, the efficiency of hybrid model on FPGAs often suffers due to wellknown random access feature of graph processing. In this paper, we present a hybrid graph processing system on FPGAs, which can achieve the best of both worlds. Our approach on FPGAs is unique and novel as follow. First, we propose to use edge block (consisting of edges with the same destination vertex set), which allows to sequentially access edges at block granularity for locality while still preserving the precision. Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices, this also enables to skip invalid blocks for reducing redundant computation. Second, we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history. We also present to switch models in advancewith few stalls using their state statistics. Our evaluation on a wide variety of graph algorithms for many realworld graphs shows that our approach achieves up to 3.69×speedup over state-of-the-art FPGA-based graph processing systems.

Keywords graph processing      efficiency      computational model      FPGAs     
Corresponding Author(s): Long ZHENG   
Just Accepted Date: 04 April 2019   Issue Date: 11 March 2020
 Cite this article:   
Chengbo YANG,Long ZHENG,Chuangyi GUI, et al. Efficient FPGA-based graph processing with hybrid pull-push computational model[J]. Front. Comput. Sci., 2020, 14(4): 144102.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-9020-5
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I4/144102
1 A Lumsdaine, D Gregor, B Hendrickson, B Hendrickson, J Berry. Challenges in parallel graph processing. Parallel Processing Letters, 2007, 17(1): 5–20
https://doi.org/10.1142/S0129626407002843
2 H Jin, P Yao, X Liao. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): 274–276
https://doi.org/10.1007/s11432-017-9226-8
3 G Malewicz, M H Austern, A J C Bik, J C Dehnert, I Horn, N Leiser, G Czajkowski. Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD International Conference on Management of Data. 2010, 135–146
https://doi.org/10.1145/1807167.1807184
4 Y Low, D Bickson, J Gonzalez, C Guestrin, A Kyrola, J M Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
https://doi.org/10.14778/2212351.2212354
5 J E Gonzalez, Y Low, H Gu, D Bickson, C Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30
6 A Kyrola, G E Blelloch, C Guestrin. GraphChi: large-scale graph computation on just a PC disk-based graph computation. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46
7 A Roy, I Mihailovic, W Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
https://doi.org/10.1145/2517349.2522740
8 S Beamer, K Asanovic, D Patterson. Direction-optimizing breadth-first search. Scientific Programming, 2013, 21(3–4): 137–148
https://doi.org/10.1155/2013/702694
9 J Shun, G E Blelloch. Ligra: a lightweight graph processing frame work for shared memory. In: Proceedings of SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2013, 135–146
https://doi.org/10.1145/2517327.2442530
10 D Nguyen, A Lenharth, K Pingali. A lightweight infrastructure for graph analytics. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 456–471
https://doi.org/10.1145/2517349.2522739
11 W S Han, S Lee, K Park, J H Lee, M S Kim, J Kim, H Yu. Turbo-Graph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 77–85
https://doi.org/10.1145/2487575.2487581
12 J E Gonzalez, R S Xin, A Dave, D Crankshaw, M J Franklin, I Stoica. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613
13 X Zhu, W Han, W Chen. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference. 2015, 375–386
14 D Sengupta, S L Song, K Agarwal, K Schwan. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 28
https://doi.org/10.1145/2807591.2807655
15 Y Chi, G Dai, Y Wang, G Sun, G Li, H Yang. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd International Conference on Data Engineering. 2016, 409–420
https://doi.org/10.1109/ICDE.2016.7498258
16 X Zhu, W Chen, W Zheng, X Ma. Gemini: a computation-centric distributed graph processing system. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 301–316
17 A Roy, L Bindschaedler, J Malicevic, W Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. 2015, 410–424
https://doi.org/10.1145/2815400.2815408
18 F Khorasani, K Vora, R Gupta, L N Bhuyan. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 2014, 239–252
https://doi.org/10.1145/2600212.2600227
19 Z Fu, M Personick, B Thompson. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems. 2014, 1–6
https://doi.org/10.1145/2621934.2621936
20 H Liu, H H Huang. Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12
https://doi.org/10.1145/2807591.2807594
21 L Ma, Z Yang, H Chen, J Xue, Y Dai. Garaph: efficient GPUaccelerated graph processing on a single machine with balanced replication. In: Proceedings of USENIX Annual Technical Conference. 2017, 195–207
22 J Ahn, S Hong, S Yoo, O Mutlu, K Choi. A scalable processing-inmemory accelerator for parallel graph processing. In: Proceedings of International Symposium on Computer Architecture. 2015, 105–117
https://doi.org/10.1145/2872887.2750386
23 T J Ham, L Wu, N Sundaram, N Satish, M Martonosi. Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACMInternational Symposium on Microarchitecture. 2016, 1–13
https://doi.org/10.1109/MICRO.2016.7783759
24 L Song, Y Zhuo, X Qian, H Li, Y Chen. GraphR: accelerating graph processing using ReRAM. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 531–543
https://doi.org/10.1109/HPCA.2018.00052
25 M Zhang, Y Zhuo, C Wang, M Gao, Y Wu, K Chen, C Kozyrakis, X Qian. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 544–557
https://doi.org/10.1109/HPCA.2018.00053
26 E Nurvitadhi, G Weisz, Y Wang, S Hurkat, M Nguyen, J C Hoe, J F Martinez, C Guestrin. GraphGen: an FPGA framework for vertexcentric graph computation. In: Proceedings of the 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 2014, 25–28
https://doi.org/10.1109/FCCM.2014.15
27 Y Umuroglu, D Morrison, M Jahre. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications. 2015, 1–8
https://doi.org/10.1109/FPL.2015.7293939
28 S Zhou, C Chelmis, V K Prasanna. High-throughput and energyefficient graph processing on FPGA. In: Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 2016, 103–110
https://doi.org/10.1109/FCCM.2016.35
29 T Oguntebi, K Olukotun. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 111–117
https://doi.org/10.1145/2847263.2847337
30 G Dai, Y Chi, Y Wang, H Yang. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 105–110
https://doi.org/10.1145/2847263.2847339
31 G Dai, T Huang, Y Chi, N Xu, Y Wang, H Yang. ForeGraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 217–226
https://doi.org/10.1145/3020078.3021739
32 J Zhang, S Khoram, J Li. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 207–216
https://doi.org/10.1145/3020078.3021737
33 J Zhang, J Li. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2018, 229–238
https://doi.org/10.1145/3174243.3174245
34 N Engelhardt, H K H So. GraVF: a vertex-centric distributed graph processing framework on FPGAs. In: Proceedings of the 26th International Conference on Field-Programmable Custom Computing Machines. 2016, 1–4
https://doi.org/10.1109/FPL.2016.7577360
35 S Zhou, V K Prasanna. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing. 2017, 137–144
https://doi.org/10.1109/SBAC-PAD.2017.25
36 Intel Altera. Altera SDK for OpenCL programming guide. Intel Altera, 2013
37 Intel Altera. Altera SDK for OpenCL best practices guide version 16.0. Intel Altera, 2016
38 R R McCune, T Weninger, G Madey. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys, 2015, 48(2): 25
https://doi.org/10.1145/2818185
39 D J Watts, S H Strogatz. Collective dynamics of small-world networks. Nature, 1998, 393(6684): 440
https://doi.org/10.1038/30918
40 A S Li, X C Li, Y C Pan, W Zhang. Strategies for network security. Science China Information Sciences, 2015, 58(1): 1–14
https://doi.org/10.1007/s11432-014-5182-9
41 D Langr, P Tvrdik. Evaluation criteria for sparse matrix storage formats. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 428–440
https://doi.org/10.1109/TPDS.2015.2401575
42 Z Wang, B He, W Zhang, S Jiang. A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of the International Conference on High Performance Computer Architecture. 2016, 114–125
https://doi.org/10.1109/HPCA.2016.7446058
43 M Aaftab. The OpenCL specification version 1.0. Khronos OpenCL Working Group, 2009
44 J Leskovec, A Krevl. Snap datasets: stanford large network dataset collection. 2014
[1] FCS-0002-19020-CY_suppl_1 Download
[1] Xinqiao LV, Wei XIAO, Yu ZHANG, Xiaofei LIAO, Hai JIN, Qiangsheng HUA. An effective framework for asynchronous incremental graph processing[J]. Front. Comput. Sci., 2019, 13(3): 539-551.
[2] Tao TIAN, Hanli WANG. Large-scale video compression: recent advances and challenges[J]. Front. Comput. Sci., 2018, 12(5): 825-839.
[3] Zhen LI, Yuqing WANG, Tian ZHI, Tianshi CHEN. A survey of neural network accelerators[J]. Front. Comput. Sci., 2017, 11(5): 746-761.
[4] Hui DOU, Yong QI. An online electricity cost budgeting algorithm for maximizing green energy usage across data centers[J]. Front. Comput. Sci., 2017, 11(4): 661-674.
[5] Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
[6] Jie WEN, Xiaofeng MENG, Xing HAO, Jianliang XU. An efficient approach for continuous density queries[J]. Front Comput Sci, 2012, 6(5): 581-595.
[7] Defu CHEN, Zhengsu TAO. An adaptive polling interval and short preamble media access control protocol for wireless sensor networks[J]. Front Comput Sci Chin, 2011, 5(3): 300-307.
[8] Lili RONG , Tianzhu GUO , Jiyong ZHANG , . A new centrality measure based on sub-tree[J]. Front. Comput. Sci., 2009, 3(3): 356-360.
[9] Behrouz MAHAM, Mérouane DEBBAH, Are HJ?RUNGNES. Energy-efficient cooperative routing in BER constrained multihop networks[J]. Front Comput Sci Chin, 2009, 3(2): 263-271.
[10] ZHANG Rong, ZETTSU Koji, KIDAWARA Yutaka, KIYOKI Yasushi. Decentralized architecture for resource management of group-based distributed systems[J]. Front. Comput. Sci., 2008, 2(3): 224-233.
[11] GAO Ting, YAN Fengli, LI Youcheng, WANG Zhixi. Quantum probabilistically cloning and computation[J]. Front. Comput. Sci., 2008, 2(2): 179-189.
[12] ZHANG Yunquan, CHEN Guoliang, SUN Guangzhong, MIAO Qiankun. Models of parallel computation: a survey and classification[J]. Front. Comput. Sci., 2007, 1(2): 156-165.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed