|
|
Efficient FPGA-based graph processing with hybrid pull-push computational model |
Chengbo YANG, Long ZHENG( ), Chuangyi GUI, Hai JIN |
National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology,Wuhan 430074, China |
|
|
Abstract Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs. Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations. Nevertheless, considering the limited onchip resources and streamline pipeline computation, the efficiency of hybrid model on FPGAs often suffers due to wellknown random access feature of graph processing. In this paper, we present a hybrid graph processing system on FPGAs, which can achieve the best of both worlds. Our approach on FPGAs is unique and novel as follow. First, we propose to use edge block (consisting of edges with the same destination vertex set), which allows to sequentially access edges at block granularity for locality while still preserving the precision. Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices, this also enables to skip invalid blocks for reducing redundant computation. Second, we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history. We also present to switch models in advancewith few stalls using their state statistics. Our evaluation on a wide variety of graph algorithms for many realworld graphs shows that our approach achieves up to 3.69×speedup over state-of-the-art FPGA-based graph processing systems.
|
Keywords
graph processing
efficiency
computational model
FPGAs
|
Corresponding Author(s):
Long ZHENG
|
Just Accepted Date: 04 April 2019
Issue Date: 11 March 2020
|
|
1 |
A Lumsdaine, D Gregor, B Hendrickson, B Hendrickson, J Berry. Challenges in parallel graph processing. Parallel Processing Letters, 2007, 17(1): 5–20
https://doi.org/10.1142/S0129626407002843
|
2 |
H Jin, P Yao, X Liao. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): 274–276
https://doi.org/10.1007/s11432-017-9226-8
|
3 |
G Malewicz, M H Austern, A J C Bik, J C Dehnert, I Horn, N Leiser, G Czajkowski. Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD International Conference on Management of Data. 2010, 135–146
https://doi.org/10.1145/1807167.1807184
|
4 |
Y Low, D Bickson, J Gonzalez, C Guestrin, A Kyrola, J M Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
https://doi.org/10.14778/2212351.2212354
|
5 |
J E Gonzalez, Y Low, H Gu, D Bickson, C Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30
|
6 |
A Kyrola, G E Blelloch, C Guestrin. GraphChi: large-scale graph computation on just a PC disk-based graph computation. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46
|
7 |
A Roy, I Mihailovic, W Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
https://doi.org/10.1145/2517349.2522740
|
8 |
S Beamer, K Asanovic, D Patterson. Direction-optimizing breadth-first search. Scientific Programming, 2013, 21(3–4): 137–148
https://doi.org/10.1155/2013/702694
|
9 |
J Shun, G E Blelloch. Ligra: a lightweight graph processing frame work for shared memory. In: Proceedings of SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2013, 135–146
https://doi.org/10.1145/2517327.2442530
|
10 |
D Nguyen, A Lenharth, K Pingali. A lightweight infrastructure for graph analytics. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 456–471
https://doi.org/10.1145/2517349.2522739
|
11 |
W S Han, S Lee, K Park, J H Lee, M S Kim, J Kim, H Yu. Turbo-Graph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 77–85
https://doi.org/10.1145/2487575.2487581
|
12 |
J E Gonzalez, R S Xin, A Dave, D Crankshaw, M J Franklin, I Stoica. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613
|
13 |
X Zhu, W Han, W Chen. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference. 2015, 375–386
|
14 |
D Sengupta, S L Song, K Agarwal, K Schwan. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 28
https://doi.org/10.1145/2807591.2807655
|
15 |
Y Chi, G Dai, Y Wang, G Sun, G Li, H Yang. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd International Conference on Data Engineering. 2016, 409–420
https://doi.org/10.1109/ICDE.2016.7498258
|
16 |
X Zhu, W Chen, W Zheng, X Ma. Gemini: a computation-centric distributed graph processing system. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 301–316
|
17 |
A Roy, L Bindschaedler, J Malicevic, W Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. 2015, 410–424
https://doi.org/10.1145/2815400.2815408
|
18 |
F Khorasani, K Vora, R Gupta, L N Bhuyan. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 2014, 239–252
https://doi.org/10.1145/2600212.2600227
|
19 |
Z Fu, M Personick, B Thompson. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems. 2014, 1–6
https://doi.org/10.1145/2621934.2621936
|
20 |
H Liu, H H Huang. Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12
https://doi.org/10.1145/2807591.2807594
|
21 |
L Ma, Z Yang, H Chen, J Xue, Y Dai. Garaph: efficient GPUaccelerated graph processing on a single machine with balanced replication. In: Proceedings of USENIX Annual Technical Conference. 2017, 195–207
|
22 |
J Ahn, S Hong, S Yoo, O Mutlu, K Choi. A scalable processing-inmemory accelerator for parallel graph processing. In: Proceedings of International Symposium on Computer Architecture. 2015, 105–117
https://doi.org/10.1145/2872887.2750386
|
23 |
T J Ham, L Wu, N Sundaram, N Satish, M Martonosi. Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACMInternational Symposium on Microarchitecture. 2016, 1–13
https://doi.org/10.1109/MICRO.2016.7783759
|
24 |
L Song, Y Zhuo, X Qian, H Li, Y Chen. GraphR: accelerating graph processing using ReRAM. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 531–543
https://doi.org/10.1109/HPCA.2018.00052
|
25 |
M Zhang, Y Zhuo, C Wang, M Gao, Y Wu, K Chen, C Kozyrakis, X Qian. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 544–557
https://doi.org/10.1109/HPCA.2018.00053
|
26 |
E Nurvitadhi, G Weisz, Y Wang, S Hurkat, M Nguyen, J C Hoe, J F Martinez, C Guestrin. GraphGen: an FPGA framework for vertexcentric graph computation. In: Proceedings of the 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 2014, 25–28
https://doi.org/10.1109/FCCM.2014.15
|
27 |
Y Umuroglu, D Morrison, M Jahre. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications. 2015, 1–8
https://doi.org/10.1109/FPL.2015.7293939
|
28 |
S Zhou, C Chelmis, V K Prasanna. High-throughput and energyefficient graph processing on FPGA. In: Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 2016, 103–110
https://doi.org/10.1109/FCCM.2016.35
|
29 |
T Oguntebi, K Olukotun. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 111–117
https://doi.org/10.1145/2847263.2847337
|
30 |
G Dai, Y Chi, Y Wang, H Yang. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 105–110
https://doi.org/10.1145/2847263.2847339
|
31 |
G Dai, T Huang, Y Chi, N Xu, Y Wang, H Yang. ForeGraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 217–226
https://doi.org/10.1145/3020078.3021739
|
32 |
J Zhang, S Khoram, J Li. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 207–216
https://doi.org/10.1145/3020078.3021737
|
33 |
J Zhang, J Li. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2018, 229–238
https://doi.org/10.1145/3174243.3174245
|
34 |
N Engelhardt, H K H So. GraVF: a vertex-centric distributed graph processing framework on FPGAs. In: Proceedings of the 26th International Conference on Field-Programmable Custom Computing Machines. 2016, 1–4
https://doi.org/10.1109/FPL.2016.7577360
|
35 |
S Zhou, V K Prasanna. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing. 2017, 137–144
https://doi.org/10.1109/SBAC-PAD.2017.25
|
36 |
Intel Altera. Altera SDK for OpenCL programming guide. Intel Altera, 2013
|
37 |
Intel Altera. Altera SDK for OpenCL best practices guide version 16.0. Intel Altera, 2016
|
38 |
R R McCune, T Weninger, G Madey. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys, 2015, 48(2): 25
https://doi.org/10.1145/2818185
|
39 |
D J Watts, S H Strogatz. Collective dynamics of small-world networks. Nature, 1998, 393(6684): 440
https://doi.org/10.1038/30918
|
40 |
A S Li, X C Li, Y C Pan, W Zhang. Strategies for network security. Science China Information Sciences, 2015, 58(1): 1–14
https://doi.org/10.1007/s11432-014-5182-9
|
41 |
D Langr, P Tvrdik. Evaluation criteria for sparse matrix storage formats. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 428–440
https://doi.org/10.1109/TPDS.2015.2401575
|
42 |
Z Wang, B He, W Zhang, S Jiang. A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of the International Conference on High Performance Computer Architecture. 2016, 114–125
https://doi.org/10.1109/HPCA.2016.7446058
|
43 |
M Aaftab. The OpenCL specification version 1.0. Khronos OpenCL Working Group, 2009
|
44 |
J Leskovec, A Krevl. Snap datasets: stanford large network dataset collection. 2014
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|