Efficient FPGA-based graph processing with hybrid pull-push computational model

doi:10.1007/s11704-019-9020-5

Front. Comput. Sci.

2020, Vol. 14

Issue (4) : 144102 https://doi.org/10.1007/s11704-019-9020-5

RESEARCH ARTICLE

Efficient FPGA-based graph processing with hybrid pull-push computational model

Chengbo YANG, Long ZHENG(

), Chuangyi GUI, Hai JIN

National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology,Wuhan 430074, China

Download: PDF(592 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs. Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations. Nevertheless, considering the limited onchip resources and streamline pipeline computation, the efficiency of hybrid model on FPGAs often suffers due to wellknown random access feature of graph processing. In this paper, we present a hybrid graph processing system on FPGAs, which can achieve the best of both worlds. Our approach on FPGAs is unique and novel as follow. First, we propose to use edge block (consisting of edges with the same destination vertex set), which allows to sequentially access edges at block granularity for locality while still preserving the precision. Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices, this also enables to skip invalid blocks for reducing redundant computation. Second, we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history. We also present to switch models in advancewith few stalls using their state statistics. Our evaluation on a wide variety of graph algorithms for many realworld graphs shows that our approach achieves up to 3.69×speedup over state-of-the-art FPGA-based graph processing systems.

Keywords graph processing efficiency computational model FPGAs

Corresponding Author(s): Long ZHENG

Just Accepted Date: 04 April 2019 Issue Date: 11 March 2020

Cite this article:

Chengbo YANG,Long ZHENG,Chuangyi GUI, et al. Efficient FPGA-based graph processing with hybrid pull-push computational model[J]. Front. Comput. Sci., 2020, 14(4): 144102.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-9020-5
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I4/144102

1	A Lumsdaine, D Gregor, B Hendrickson, B Hendrickson, J Berry. Challenges in parallel graph processing. Parallel Processing Letters, 2007, 17(1): 5–20 https://doi.org/10.1142/S0129626407002843
2	H Jin, P Yao, X Liao. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): 274–276 https://doi.org/10.1007/s11432-017-9226-8
3	G Malewicz, M H Austern, A J C Bik, J C Dehnert, I Horn, N Leiser, G Czajkowski. Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD International Conference on Management of Data. 2010, 135–146 https://doi.org/10.1145/1807167.1807184
4	Y Low, D Bickson, J Gonzalez, C Guestrin, A Kyrola, J M Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727 https://doi.org/10.14778/2212351.2212354
5	J E Gonzalez, Y Low, H Gu, D Bickson, C Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30
6	A Kyrola, G E Blelloch, C Guestrin. GraphChi: large-scale graph computation on just a PC disk-based graph computation. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46
7	A Roy, I Mihailovic, W Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488 https://doi.org/10.1145/2517349.2522740
8	S Beamer, K Asanovic, D Patterson. Direction-optimizing breadth-first search. Scientific Programming, 2013, 21(3–4): 137–148 https://doi.org/10.1155/2013/702694
9	J Shun, G E Blelloch. Ligra: a lightweight graph processing frame work for shared memory. In: Proceedings of SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2013, 135–146 https://doi.org/10.1145/2517327.2442530
10	D Nguyen, A Lenharth, K Pingali. A lightweight infrastructure for graph analytics. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 456–471 https://doi.org/10.1145/2517349.2522739
11	W S Han, S Lee, K Park, J H Lee, M S Kim, J Kim, H Yu. Turbo-Graph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 77–85 https://doi.org/10.1145/2487575.2487581
12	J E Gonzalez, R S Xin, A Dave, D Crankshaw, M J Franklin, I Stoica. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613
13	X Zhu, W Han, W Chen. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference. 2015, 375–386
14	D Sengupta, S L Song, K Agarwal, K Schwan. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 28 https://doi.org/10.1145/2807591.2807655
15	Y Chi, G Dai, Y Wang, G Sun, G Li, H Yang. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd International Conference on Data Engineering. 2016, 409–420 https://doi.org/10.1109/ICDE.2016.7498258
16	X Zhu, W Chen, W Zheng, X Ma. Gemini: a computation-centric distributed graph processing system. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 301–316
17	A Roy, L Bindschaedler, J Malicevic, W Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. 2015, 410–424 https://doi.org/10.1145/2815400.2815408
18	F Khorasani, K Vora, R Gupta, L N Bhuyan. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 2014, 239–252 https://doi.org/10.1145/2600212.2600227
19	Z Fu, M Personick, B Thompson. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems. 2014, 1–6 https://doi.org/10.1145/2621934.2621936
20	H Liu, H H Huang. Enterprise: breadth-first graph traversal on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12 https://doi.org/10.1145/2807591.2807594
21	L Ma, Z Yang, H Chen, J Xue, Y Dai. Garaph: efficient GPUaccelerated graph processing on a single machine with balanced replication. In: Proceedings of USENIX Annual Technical Conference. 2017, 195–207
22	J Ahn, S Hong, S Yoo, O Mutlu, K Choi. A scalable processing-inmemory accelerator for parallel graph processing. In: Proceedings of International Symposium on Computer Architecture. 2015, 105–117 https://doi.org/10.1145/2872887.2750386
23	T J Ham, L Wu, N Sundaram, N Satish, M Martonosi. Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACMInternational Symposium on Microarchitecture. 2016, 1–13 https://doi.org/10.1109/MICRO.2016.7783759
24	L Song, Y Zhuo, X Qian, H Li, Y Chen. GraphR: accelerating graph processing using ReRAM. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 531–543 https://doi.org/10.1109/HPCA.2018.00052
25	M Zhang, Y Zhuo, C Wang, M Gao, Y Wu, K Chen, C Kozyrakis, X Qian. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of International Symposium on High Performance Computer Architecture. 2018, 544–557 https://doi.org/10.1109/HPCA.2018.00053
26	E Nurvitadhi, G Weisz, Y Wang, S Hurkat, M Nguyen, J C Hoe, J F Martinez, C Guestrin. GraphGen: an FPGA framework for vertexcentric graph computation. In: Proceedings of the 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 2014, 25–28 https://doi.org/10.1109/FCCM.2014.15
27	Y Umuroglu, D Morrison, M Jahre. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications. 2015, 1–8 https://doi.org/10.1109/FPL.2015.7293939
28	S Zhou, C Chelmis, V K Prasanna. High-throughput and energyefficient graph processing on FPGA. In: Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 2016, 103–110 https://doi.org/10.1109/FCCM.2016.35
29	T Oguntebi, K Olukotun. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 111–117 https://doi.org/10.1145/2847263.2847337
30	G Dai, Y Chi, Y Wang, H Yang. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016, 105–110 https://doi.org/10.1145/2847263.2847339
31	G Dai, T Huang, Y Chi, N Xu, Y Wang, H Yang. ForeGraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 217–226 https://doi.org/10.1145/3020078.3021739
32	J Zhang, S Khoram, J Li. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017, 207–216 https://doi.org/10.1145/3020078.3021737
33	J Zhang, J Li. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2018, 229–238 https://doi.org/10.1145/3174243.3174245
34	N Engelhardt, H K H So. GraVF: a vertex-centric distributed graph processing framework on FPGAs. In: Proceedings of the 26th International Conference on Field-Programmable Custom Computing Machines. 2016, 1–4 https://doi.org/10.1109/FPL.2016.7577360
35	S Zhou, V K Prasanna. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing. 2017, 137–144 https://doi.org/10.1109/SBAC-PAD.2017.25
36	Intel Altera. Altera SDK for OpenCL programming guide. Intel Altera, 2013
37	Intel Altera. Altera SDK for OpenCL best practices guide version 16.0. Intel Altera, 2016
38	R R McCune, T Weninger, G Madey. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys, 2015, 48(2): 25 https://doi.org/10.1145/2818185
39	D J Watts, S H Strogatz. Collective dynamics of small-world networks. Nature, 1998, 393(6684): 440 https://doi.org/10.1038/30918
40	A S Li, X C Li, Y C Pan, W Zhang. Strategies for network security. Science China Information Sciences, 2015, 58(1): 1–14 https://doi.org/10.1007/s11432-014-5182-9
41	D Langr, P Tvrdik. Evaluation criteria for sparse matrix storage formats. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 428–440 https://doi.org/10.1109/TPDS.2015.2401575
42	Z Wang, B He, W Zhang, S Jiang. A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of the International Conference on High Performance Computer Architecture. 2016, 114–125 https://doi.org/10.1109/HPCA.2016.7446058
43	M Aaftab. The OpenCL specification version 1.0. Khronos OpenCL Working Group, 2009
44	J Leskovec, A Krevl. Snap datasets: stanford large network dataset collection. 2014

[1]

FCS-0002-19020-CY_suppl_1

Download

[1]	Xinqiao LV, Wei XIAO, Yu ZHANG, Xiaofei LIAO, Hai JIN, Qiangsheng HUA. An effective framework for asynchronous incremental graph processing[J]. Front. Comput. Sci., 2019, 13(3): 539-551.
[2]	Tao TIAN, Hanli WANG. Large-scale video compression: recent advances and challenges[J]. Front. Comput. Sci., 2018, 12(5): 825-839.
[3]	Zhen LI, Yuqing WANG, Tian ZHI, Tianshi CHEN. A survey of neural network accelerators[J]. Front. Comput. Sci., 2017, 11(5): 746-761.
[4]	Hui DOU, Yong QI. An online electricity cost budgeting algorithm for maximizing green energy usage across data centers[J]. Front. Comput. Sci., 2017, 11(4): 661-674.
[5]	Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
[6]	Jie WEN, Xiaofeng MENG, Xing HAO, Jianliang XU. An efficient approach for continuous density queries[J]. Front Comput Sci, 2012, 6(5): 581-595.
[7]	Defu CHEN, Zhengsu TAO. An adaptive polling interval and short preamble media access control protocol for wireless sensor networks[J]. Front Comput Sci Chin, 2011, 5(3): 300-307.
[8]	Lili RONG , Tianzhu GUO , Jiyong ZHANG , . A new centrality measure based on sub-tree[J]. Front. Comput. Sci., 2009, 3(3): 356-360.
[9]	Behrouz MAHAM, Mérouane DEBBAH, Are HJ?RUNGNES. Energy-efficient cooperative routing in BER constrained multihop networks[J]. Front Comput Sci Chin, 2009, 3(2): 263-271.
[10]	ZHANG Rong, ZETTSU Koji, KIDAWARA Yutaka, KIYOKI Yasushi. Decentralized architecture for resource management of group-based distributed systems[J]. Front. Comput. Sci., 2008, 2(3): 224-233.
[11]	GAO Ting, YAN Fengli, LI Youcheng, WANG Zhixi. Quantum probabilistically cloning and computation[J]. Front. Comput. Sci., 2008, 2(2): 179-189.
[12]	ZHANG Yunquan, CHEN Guoliang, SUN Guangzhong, MIAO Qiankun. Models of parallel computation: a survey and classification[J]. Front. Comput. Sci., 2007, 1(2): 156-165.

Viewed

Full text

Abstract

Cited

Shared

Discussed