Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (3) : 462-476    https://doi.org/10.1007/s11704-016-5485-7
RESEARCH ARTICLE
iGraph: an incremental data processing system for dynamic graph
Wuyang JU,Jianxin LI(),Weiren YU,Richong ZHANG
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
 Download: PDF(625 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this paper, we design iGraph, an incremental graph processing system for dynamic graph with its continuous updates. The contributions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertexbased graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incremental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algorithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.

Keywords big data      distributed system      in-memory computing      graph processing      hotspot detection     
Corresponding Author(s): Jianxin LI   
Just Accepted Date: 16 March 2016   Online First Date: 26 April 2016    Issue Date: 16 May 2016
 Cite this article:   
Wuyang JU,Jianxin LI,Weiren YU, et al. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5485-7
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I3/462
1 Shao Y, Cui B, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530
2 Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 135–146
3 Salihoglu S, Widom J. GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 2013, 22:1–22:12
4 Power R, Li J Y. Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 293–306
5 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Graphlab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 340–349
6 Pearce R A, Gokhale M, Amato N M. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing Networking, Storage and Analysis. 2010, 1–11
7 Kang U, Tsourakakis C E, Faloutsos C. PEGASUS: a peta-scale graph mining system. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 229–238
8 Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30
9 Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S. One trillion edges: graph processing at Facebook-scale. Proceedings of the VLDB Endowment, 2015, 8(12): 1804–1815
10 Yan D, Cheng J, Lu Y, Ng W. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992
11 Zhang Y, Liao X F, Jin H, Lin L, Lu F. An adaptive switching scheme for iterative computing in the cloud. Frontiers of Computer Science, 2014, 8(6): 872–884
12 Zheng X L, Zhong Y G, Zeng D, Wang F Y. Social influence and spread dynamics in social networks. Frontiers of Computer Science, 2012, 6(5): 611–620
13 Kumar R, Novak J, Tomkins A. Structure and evolution of online social networks. In: Philip S Y, Han J, Faloutsos C, eds. Link Mining: Models, Algorithms, and Applications. New York: Springer, 2010, 337–357
14 Yan D, Cheng J, Lu Y, Ng W. Effective techniques for message reduction and load balancing in distributed graph computation. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1307–1317
15 Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. 2012, 15–28
16 Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. Graphx: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613
17 Ma S, Li J, Hu C M, Lin X L, Huai J P. Big graph search: challenges and techniques. Frontiers of Computer Science, 2016, 10(3): 387–398
18 Cheatham T, Fahmy A F, Stefanescu D C, Valiant L G. Bulk synchronous parallel computing —a paradigm for transportable software. In: Proceedings of Annual Hawaii International Conference on System Sciences. 1995, 268–275
19 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Distributed graphlab: a framework for machine learning in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
20 Pujol J M, Erramilli V, Siganos G, Yang X, Laoutaris N, Chhabra P, Rodriguez P.The little engine(s) that could: scaling online social networks. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 375–386
21 Mondal J, Deshpande A. Managing large dynamic graphs efficiently. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2012, 145–156
22 Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2012, 517–528
23 Bu Y, Howe B, Balazinska M, Ernst M D. Haloop: efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 2010, 3(1): 285–296
24 Logothetis D, Olston C, Reed B, Webb K C, Yocum K. Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 51–62
25 Popa L, Budiu M, Yu Y, Isard M. Dryadinc: reusing work in largescale computations. In: Proceedings of Workshop on Hot Topics in Cloud Computing. 2009
26 Gunda P K, Ravindranath L, Thekkath C A, Yu Y, Zhuang L. Nectar: automatic management of data and computation in datacenters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 75–88
27 Bhatotia P, Wieder A, Rodrigues R, Acar U A, Pasquin R. Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011
28 Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 251–264
29 Murray D G, McSherry F, Isaacs R, Isard M, Barham P, Abadi M. Naiad: a timely dataflow system. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 439–455
30 Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
31 Kyrola A, Blelloch G E, Guestrin C. Graphchi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46
32 Cheng R, Hong J, Kyrola A, Miao Y, Weng X, Wu M, Yang F, Zhou L, Zhao F, Chen E. Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems. 2012, 85–98
33 Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 423–438
34 Çatalyürek Ü V, Aykanat C, Uçar B. On two-dimensional sparse matrix partitioning: models, methods, and a recipe. SIAM Journal on Scientific Computing, 2010, 32(2): 656–683
35 Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web. Technical Report. 1999
[1]  Supplementary Material Download
[1] Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[2] Chengbo YANG, Long ZHENG, Chuangyi GUI, Hai JIN. Efficient FPGA-based graph processing with hybrid pull-push computational model[J]. Front. Comput. Sci., 2020, 14(4): 144102-.
[3] Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing[J]. Front. Comput. Sci., 2020, 14(4): 144607-.
[4] Xingyue CHEN, Tao SHANG, Feng ZHANG, Jianwei LIU, Zhenyu GUAN. Dynamic data auditing scheme for big data storage[J]. Front. Comput. Sci., 2020, 14(1): 219-229.
[5] Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.
[6] Xinqiao LV, Wei XIAO, Yu ZHANG, Xiaofei LIAO, Hai JIN, Qiangsheng HUA. An effective framework for asynchronous incremental graph processing[J]. Front. Comput. Sci., 2019, 13(3): 539-551.
[7] Min NIE, Lei YANG, Jun SUN, Han SU, Hu XIA, Defu LIAN, Kai YAN. Advanced forecasting of career choices for college students based on campus big data[J]. Front. Comput. Sci., 2018, 12(3): 494-503.
[8] Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[9] Shuai MA,Jia LI,Chunming HU,Xuelian LIN,Jinpeng HUAI. Big graph search: challenges and techniques[J]. Front. Comput. Sci., 2016, 10(3): 387-398.
[10] Jinchuan CHEN, Yueguo CHEN, Xiaoyong DU, Cuiping LI, Jiaheng LU, Suyun ZHAO, Xuan ZHOU. Big data challenge: a data management perspective[J]. Front Comput Sci, 2013, 7(2): 157-164.
[11] Ling LIU. Computing infrastructure for big data processing[J]. Front Comput Sci, 2013, 7(2): 165-170.
[12] Chunjie LUO, Jianfeng ZHAN, Zhen JIA, Lei WANG, Gang LU, Lixin ZHANG, Cheng-Zhong XU, Ninghui SUN. CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications[J]. Front Comput Sci, 2012, 6(4): 347-362.
[13] LI Mei, LEE Wang-Chien, SIVASUBRAMANIAM Anand, ZHAO Jizhong. Supporting nearest neighbors query on high-dimensional data in P2P systems[J]. Front. Comput. Sci., 2008, 2(3): 234-247.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed