Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (5) : 887-907    https://doi.org/10.1007/s11704-018-6400-1
RESEARCH ARTICLE
A communication-reduced and computation-balanced framework for fast graph computation
Yongli CHENG1, Fang WANG2,3(), Hong JIANG4, Yu HUA2,3, Dan FENG2,3, Lingling ZHANG2,3, Jun ZHOU2,3
1. College of Mathematics and Computer Science, FuZhou University, Fuzhou 350116, China
2. Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology,Wuhan 430074, China
3. Shenzhen Huazhong University of Science and Technology Research Institute, Shenzhen 518300, China
4. Department of Computer Science & Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
 Download: PDF(1064 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graphprocessing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graphprocessing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a highbandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.

Keywords graph computation      communication decrease      computation balance     
Corresponding Author(s): Fang WANG   
Just Accepted Date: 25 September 2017   Online First Date: 06 August 2018    Issue Date: 21 September 2018
 Cite this article:   
Yongli CHENG,Fang WANG,Hong JIANG, et al. A communication-reduced and computation-balanced framework for fast graph computation[J]. Front. Comput. Sci., 2018, 12(5): 887-907.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-6400-1
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I5/887
1 Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103–111
https://doi.org/10.1145/79173.79181
2 Malewicz G, Austern M H, Bik A J C. Pregel: a system for large-scale graph processing. In: Proceedings of ACM International Conference on Management of Data. 2010, 135–146
https://doi.org/10.1145/1807167.1807184
3 Salihoglu S, Widom J. GPS: a graph processing system. In: Proceedings of ACM International Conference on Scientific and Statistical Database Management. 2013, 22–32
https://doi.org/10.1145/2484838.2484843
4 Ching A, Edunov S, Kabiljo K, Logothetis D, Muthukrishnan S. One trillion edges: graph processing at facebook-scale. Proceedings of the VIDB Endowment, 2015, 8(12): 1804–1815
https://doi.org/10.14778/2824032.2824077
5 Wang G, Xie W, Demers A J, Gehrke J. Asynchronous large-scale graph processing made easy. in: Proceedings of International Conference on Innovation Database Research. 2013, 135–146
6 Simmhan Y, Kumbhare A, Wickramaarachchi C, Nagarkar S, Ravi S, Raghavendra C, Prasanna V. Goffish: a sub-graph centric framework for large-scale graph analytics. In: Proceedings of European Conference on Parallel Processing. 2014, 451–462
https://doi.org/10.1007/978-3-319-09873-9_38
7 Kyrola A, Blelloch G E, Guestrin G. Graphchi: large-scale graph computation on just a PC. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation. 2012, 31–46
8 Cheng Y L, Wang F, Jiang H, Hua Y, Feng D, Wang X N. DD-graph: a highly cost-effective distributed disk-based graph-processing framework. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 2016, 259–262
https://doi.org/10.1145/2907294.2907299
9 Cheng Y L, Wang F, Jiang H, Hua Y, Feng D, Wang X N. LCC-graph: a high-performance graph-processing framework with low communication costs. In: Proceedings of IEEE/ACM International Symposium on Quality of Service. 2016, 91–100
10 Cheng Y L, Jiang H, Wang F, Hua Y, Feng D. BlitzG: exploiting highbandwidth networks for fast graph processing. In: Proceedings of IEEE International Conference on Computer Communications. 2017, 2340–2348
11 Page L. The pagerank citation ranking : bringing order to the web. Stanford Digital Libraries Working Paper, 1998, 9(1): 1–14
12 Stefano L D, Bulgarelli A. A simple and efficient connected components labeling algorithm. In: Proceedings of International Conference on Image Analysis and Processing. 1999, 322–327
https://doi.org/10.1109/ICIAP.1999.797615
13 Fortunato S. Community detection in graphs. Physics Reports, 2010, 486(3): 75–174
https://doi.org/10.1016/j.physrep.2009.11.002
14 Kothari R, Jain V. Learning from labeled and unlabeled data. In: Proceedings of IEEE International Joint Conference on Neural Networks. 2002, 2803–2808
https://doi.org/10.1109/IJCNN.2002.1007592
15 Lee M, Kim E J, Yousif M. Security enhancement in infiniband architecture. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium. 2005, 105–114
16 Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. Journal of Scientific Computing, 1998, 20(1): 359–392
https://doi.org/10.1137/S1064827595287997
17 Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs. Journal of Bell System Technical, 1970, 49(2): 291–307
https://doi.org/10.1002/j.1538-7305.1970.tb01770.x
18 Bui T N, Moon B R. Genetic algorithm and graph partitioning. IEEE Transactions on Computers, 1996, 45(7): 841–855
https://doi.org/10.1109/12.508322
19 Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
https://doi.org/10.1145/2517349.2522740
20 Nishimura J, Ugander J. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. 2013, 1106–1114
https://doi.org/10.1145/2487575.2487696
21 Low Y, Bickson D, Gonzalez J E, Guestrin C, Kyrola A. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
https://doi.org/10.14778/2212351.2212354
22 Gonzalez J E, Low Y, Haijie G, Danny B, Carlos G. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation. 2012, 17–30
23 Backstrom L, Huttenlocher D, Kleinberg J, Lan X. Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining. 2006, 44–54
https://doi.org/10.1145/1150402.1150412
24 Yan D, Cheng J, Xing K, Lu Y, Ng W, Bu Y G. Pregel algorithms for graph connectivity problems with performance guarantees. Proceedings of the VLDB Endowment, 2014, 7(14): 1821–1832
https://doi.org/10.14778/2733085.2733089
25 Han M, Daudjee K. Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. Proceedings of the VLDB Endowment, 2015, 8(9): 950–961
https://doi.org/10.14778/2777598.2777604
26 Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of ACM Conference on Management of Data. 2012: 517–528
https://doi.org/10.1145/2213836.2213895
27 Xin R S, Crankshaw D, Dave A, Gonzalez J E, Franklin M J, Stoica I. Graphx: unifying data-parallel and graph-parallel analytics, Computer Science, 2014, 8(3): 125–137
28 Da Y, Yingyi B, Yuanyuan T, Deshpande A. Big graph analytics platforms. Foundations and Trends in Databases, 2017, 7(2): 180–195
29 Lu Y, Cheng J, Yan D, Wu H. Large-scale distributed graph computing systems: an experimental evaluation. Proceedings of the VLDB Endowment, 2014, 8(3): 281–292
https://doi.org/10.14778/2735508.2735517
30 Zhou C, Gao J, Sun B, Yu J X. Mocgraph: scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment, 2014, 8(4): 377–388
https://doi.org/10.14778/2735496.2735501
31 Yan D, Cheng J, Lu Y, Ng Y. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992
https://doi.org/10.14778/2733085.2733103
32 Yan D, Cheng J, Ozsu M T, Lu Y. A general-purpose query-centric framework for querying big graphs. Proceedings of the VLDB Endowment, 2016, 9(7): 564–575
https://doi.org/10.14778/2904483.2904488
33 Devine K D, Boman E G, Heaphy R T. New challenges in dynamic load balancing. Applied Numerical Mathematics, 2005, 52(2): 133–152
https://doi.org/10.1016/j.apnum.2004.08.028
34 Cheng J, Liu Q, Li Z, Fan W, Lui J C S. VENUS: vertex-centric streamlined graph computation on a single PC. In: Proceedings of IEEE International Conference on Data Engineering. 2015, 1131–1142
https://doi.org/10.1109/ICDE.2015.7113362
35 Malicevic J, Roy A, Zwaenepoel W. Scale-up graph processing in the cloud:challenges and solutions. In: Proceedings of International Workshop on Cloud Data and Platforms. 2014, 1–6
https://doi.org/10.1145/2592784.2592789
36 Pearce R, Gokhale M, Amato N M. Scaling techniques for massive scale-free graphs in distributed (external) memory. In: Proceedings of International Symposium on Parallel and Distributed Processing. 2013, 825–836
https://doi.org/10.1109/IPDPS.2013.72
37 Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: scale-out graph processing from secondary storage. In: Proceedings of Symposium on Operating Systems Principles. 2015, 410–424
https://doi.org/10.1145/2815400.2815408
38 Zhu X, Han W, Chen W. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of Usenix Conference on Usenix Technical Conference. 2015, 375–386
39 Xu N, Chen L, Cui B. LogGP: a log-based dynamic graph partitioning method. Proceedings of the VLDB Endowment, 2014, 7(14): 1917–1928
https://doi.org/10.14778/2733085.2733097
40 Ming W, Fan Y, Jilong X, Xiao W, Miao Y. GRAM: scaling graph computation to the trillions. In: Proceedings of ACM Symposium on Cloud Computing. 2015, 408–421
[1] Qiang LIU, Xiaoshe DONG, Heng CHEN, Yinfeng WANG. IncPregel: an incremental graph parallel computation model[J]. Front. Comput. Sci., 2018, 12(6): 1076-1089.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed