Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (6) : 1309-1325    https://doi.org/10.1007/s11704-018-7167-0
RESEARCH ARTICLE
An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment
Wenjie LIU(), Zhanhuai LI
School of Computer, Northwestern Polytechnical University, Xi’an 710072, China
 Download: PDF(652 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs, like finding clique in a social network, recommending friends or advertising links according to one’s interests, predicting links among websites and etc. To get the N-hop neighborhoods information on a large graph, such as a web graph, a twitter social graph, the most straightforward method is to conduct a breadth first search (BFS) on a parallel distributed graph processing framework, such as Pregel and GraphLab. However, due to the massive volume of message transfer, the BFS method results in high communication cost and has low efficiency.

In this work, we propose a key/value based method, namely KVB, which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently. Unlike the BFS method, our method need not transfer large amount of neighborhoods information, thus, significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing. Moreover, we propose a solution to efficiently load only the relevant neighborhoods for computation. Specially, we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy. We have implemented our algorithm on a distributed graph framework- Spark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster. Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.

Keywords N-hop neighborhoods      graph mining      parallel computing      distributed computing     
Corresponding Author(s): Wenjie LIU   
Just Accepted Date: 15 November 2017   Online First Date: 22 October 2018    Issue Date: 19 July 2019
 Cite this article:   
Wenjie LIU,Zhanhuai LI. An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment[J]. Front. Comput. Sci., 2019, 13(6): 1309-1325.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7167-0
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I6/1309
1 A Quamar, A Deshpande, J Lin. NScale: neighborhood-centric largescale graph analytics in the cloud. The VLDB Journal—The International Journal on Very Large Data Bases, 2016, 25(2): 125–150
2 Y Fang, R Cheng, S Luo, J Hu. Effective community search for large attributed graphs. Proceedings of the VLDB Endowment, 2016, 9(12): 1233–1244
https://doi.org/10.14778/2994509.2994538
3 S Xu, S Su, L Xiong, X Cheng, K Xiao. Differentially private frequent subgraph mining. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 229–240
https://doi.org/10.1109/ICDE.2016.7498243
4 P R Tadimety. Six Degrees of Separation. OSPF: A Network Routing Protocol, Apress, Berkeley, 2015, 1–2
https://doi.org/10.1007/978-1-4842-1410-7
5 G Calinescu. Computing 2-hop neighborhoods in Ad Hoc wireless networks. In: Proceedings of the International Conference on Ad-Hoc Networks and Wireless. 2003, 175–186
https://doi.org/10.1007/978-3-540-39611-6_16
6 J Gui, K Zhou. Flexible adjustments between energy and capacity for topology control in heterogeneous wireless multi-hop networks. Journal of Network and Systems Management, 2016, 24(4): 789–812
https://doi.org/10.1007/s10922-016-9367-y
7 M Diop, C Pham, O Thiaré. 2-hop neighborhood information for cover set selection in mission-critical surveillance with wireless image sensor networks. In: Proceedings of IFIP Wireless Days International Conference. 2013, 1–7
8 G Malewicz, M H Austern, A J Bik, J C Dehnert, I Horn, N Leiser, G Czajkowski. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135–146
https://doi.org/10.1145/1807167.1807184
9 Y Low, J E Gonzalez, A Kyrola, D Bickson, C E Guestrin, J Hellerstein. Graphlab: a new framework for parallel machine learning. 2014, arXiv preprint arXiv:1408.2041
10 Y Lu, J Cheng, D Yan, H Wu. Large-scale distributed graph computing systems: an experimental evaluation. Proceedings of the VLDB Endowment, 2014, 8(3): 281–292
https://doi.org/10.14778/2735508.2735517
11 H Liu, H H Huang, Y Hu. IBFS: concurrent breadth-first search on gpus. In: Proceedings of the 2016 International Conference on Management of Data. 2016, 403–416
https://doi.org/10.1145/2882903.2882959
12 A Clauset, C R Shalizi, M E Newman. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661–703
https://doi.org/10.1137/070710111
13 K Shvachko, H Kuang, S Radia, R Chansler. The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST). 2010, 1–10
https://doi.org/10.1109/MSST.2010.5496972
14 M Zaharia, M Chowdhury, T Das, A Dave, J Ma, M McCauley, M J Franklin, S Shenker, I Stoica. Resilient distributed datasets: a faulttolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 2
15 M Bernaschi, G Carbone, E Mastrostefano, F Vella. Solutions to the stconnectivity problem using a GPU-based distributed BFS. Journal of Parallel and Distributed Computing, 2015, 76: 145–153
https://doi.org/10.1016/j.jpdc.2014.09.013
16 J F Hair, W C Black, B J Babin, R E Anderson, R L Tatham. Multivariate Data Analysis. Pearson Prentice Hall Upper Saddle River, NJ, 2006
17 D J Ketchen, C L Shook. The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 1996, 17(6): 441–458
https://doi.org/10.1002/(SICI)1097-0266(199606)17:6<441::AID-SMJ819>3.0.CO;2-G
18 H Akaike. Information Theory and an Extension of the Maximum Likelihood Principle. Selected Papers of Hirotugu Akaike,Springer, New York, 1998, 199–213
https://doi.org/10.1007/978-1-4612-1694-0_15
19 H Bhat, N Kumar. On the derivation of the bayesian information criterion. School of Natural Sciences, University of California, 2010
20 A Linde. DIC in variable selection. Statistica Neerlandica, 2005, 59(1): 45–56
https://doi.org/10.1111/j.1467-9574.2005.00278.x
21 A Vukotic, N Watt, T Abedrabbo, D Fox, J Partner. Neo4j in Action. Manning Publications Co., 2014
22 R S Xin, J E Gonzalez, M J Franklin, I Stoica. Graphx: a resilient distributed graph system on spark. In: Proceedings of the International Workshop on Graph Data Management Experiences and Systems. 2013, 1–6
https://doi.org/10.1145/2484425.2484427
23 G Csardi. The igraph software package for complex network research. InterJournal Complex Systems, 2006, 1695(5): 1–9
24 C Avery. Giraph: large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara, 2011, 11(3): 5–9
25 H Shang, M Kitsuregawa. Efficient breadth-first search on large graphs with skewed degree distributions. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 311–322
https://doi.org/10.1145/2452376.2452413
26 D Yan, J Cheng, Y Lu, W Ng. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992
https://doi.org/10.14778/2733085.2733103
27 J Ugander, B Karrer, L Backstrom, C Marlow. The anatomy of the facebook social graph. 2011, arXiv preprint arXiv:1111.4503
[1] Yu ZHOU, Nvqi ZHOU, Tingting HAN, Jiayi GU, Weigang WU. Probabilistic verification of hierarchical leader election protocol in dynamic systems[J]. Front. Comput. Sci., 2018, 12(4): 763-776.
[2] Guoliang CHEN, Rui MAO, Kezhong LU. A parallel computing framework for big data[J]. Front. Comput. Sci., 2017, 11(4): 608-621.
[3] Shuai MA,Jia LI,Chunming HU,Xuelian LIN,Jinpeng HUAI. Big graph search: challenges and techniques[J]. Front. Comput. Sci., 2016, 10(3): 387-398.
[4] Qinma KANG, Hong HE. Task assignment for minimizing application completion time using honeybee mating optimization[J]. Front Comput Sci, 2013, 7(3): 404-415.
[5] Ying ZHANG, Gang HUANG, Wei ZHANG, Xuanzhe LIU, Hong MEI. Towards module-based automatic partitioning of Java applications[J]. Front Comput Sci, 2012, 6(6): 725-740.
[6] Zeyao MO, Aiqing ZHANG, Xiaolin CAO, Qingkai LIU, Xiaowen XU, Hengbin AN, Wenbing PEI, Shaoping ZHU, . JASMIN: a parallel software infrastructure for scientific computing[J]. Front. Comput. Sci., 2010, 4(4): 480-488.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed