Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2024, Vol. 18 Issue (3): 183105   https://doi.org/10.1007/s11704-023-2361-0
  本期目录
A disk I/O optimized system for concurrent graph processing jobs
Xianghao XU1,2, Fang WANG2, Hong JIANG3, Yongli CHENG4,5(), Dan FENG2, Peng FANG2
1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
2. Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China
3. Department of Computer Science & Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
4. College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China
5. Zhejiang Lab, Hangzhou 311121, China
 全文: PDF(17413 KB)   HTML
Abstract

In order to analyze and process the large graphs with high cost efficiency, researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer. On the other hand, with the rapidly growing need of analyzing graphs in the real-world, graph processing systems have to efficiently handle massive concurrent graph processing (CGP) jobs. Unfortunately, due to the inherent design for single graph processing job, existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs. In this paper, we propose GraphCP, a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs. GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices, which efficiently overcomes above challenges faced by existing out-of-core graph processing systems. Moreover, GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations. In addition, GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality. Extensive evaluation results show that GraphCP is 20.5× and 8.9× faster than two out-of-core graph processing systems GridGraph and GraphZ, and 3.5× and 1.7× faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.

Key wordsgraph processing    disk I/O    concurrent jobs
收稿日期: 2022-06-16      出版日期: 2023-04-26
Corresponding Author(s): Yongli CHENG   
 引用本文:   
. [J]. Frontiers of Computer Science, 2024, 18(3): 183105.
Xianghao XU, Fang WANG, Hong JIANG, Yongli CHENG, Dan FENG, Peng FANG. A disk I/O optimized system for concurrent graph processing jobs. Front. Comput. Sci., 2024, 18(3): 183105.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-023-2361-0
https://academic.hep.com.cn/fcs/CN/Y2024/V18/I3/183105
Fig.1  
Fig.2  
Fig.3  
Notation Definition
V The vertex set
E The edge set
P Number of intervals
J Number of CGP jobs
A Active vertex set in current iteration
M Size of an edge structure value
N Size of a vertex value
W Size of an edge weight value
Sseq Size of edges that are sequentially read
Sran Size of edges that are randomly read
Trr Random read bandwidth
Trw Random write bandwidth
Tsr Sequential read bandwidth
Tsw Sequential write bandwidth
Tab.1  
  
Fig.4  
Fig.5  
  
  
  
  
Dataset Vertices Edges Type
LiveJournal 4.8 million 69 million Social graph
Twitter2010 42 million 1.5 billion Social graph
SK2005 51 million 1.9 billion Social graph
UK2007 106 million 3.7 billion Web graph
Kron30 1 billion 32 billion Synthetic graph
Tab.2  
LJ TT SK UK KR
GridGraph 16.9 3115.4 3807.1 6911.2 ?
GraphZ 9.4 1639.6 1586.3 2303.7 ?
Seraph 9.6 328.2 768.4 1573.3 130944.5
GraphSO 8.5 197.5 326.1 702.5 62354.5
GraphCP-v1 6.3 205.1 349.3 561.9 56932.4
GraphCP 6.4 128.2 232.9 330.5 40666.2
Tab.3  
Fig.6  
Fig.7  
Workload Algorithms Characteristic
Het {PR, BFS, WCC, SSSP} Heterogeneous
Hom1 {BFS} × 4 Homogeneous
Hom2 {SSSP} × 4 Homogeneous
Sim1 {BFS, SSSP} × 2 Similar Algorithms
Sim2 {PR, WCC} × 2 Similar Algorithms
Tab.4  
Fig.8  
Fig.9  
Fig.10  
Dataset Computation overheads/s Reduced I/O overheads/s
Twitter2010 3.4 71.6
SK2005 12.1 214.5
UK2007 15.2 618.1
Kron30 293.1 11386.5
Tab.5  
Fig.11  
Fig.12  
Fig.13  
Fig.14  
Fig.15  
  
  
  
  
  
  
1 G, Malewicz M H, Austern A J C, Bik J C, Dehnert I, Horn N, Leiser G Czajkowski . Pregel: a system for large-scale graph processing. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135−146
2 Y, Low D, Bickson J, Gonzalez C, Guestrin A, Kyrola J M Hellerstein . Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5( 8): 716–727
3 J E, Gonzalez Y, Low H, Gu D, Bickson C Guestrin . PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17−30
4 X, Zhu W, Chen W, Zheng X Ma . Gemini: a computation-centric distributed graph processing system. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. 2016, 301−316
5 A, Kyrola G, Blelloch C Guestrin . GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 31−46
6 Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472−488
7 X, Zhu W, Han W Chen . GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of 2015 USENIX Conference on Usenix Annual Technical Conference. 2015, 375−386
8 K Vora . LUMOS: dependency-driven disk-based graph processing. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 429−442
9 R, Chen J, Shi Y, Chen B, Zang H, Guan H Chen . PowerLyra: differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing, 2019, 5( 3): 13
10 Y, Cheng F, Wang H, Jiang Y, Hua D, Feng L, Zhang J Zhou . A communication-reduced and computation-balanced framework for fast graph computation. Frontiers of Computer Science, 2018, 12( 5): 887–907
11 Y, Zhang X, Liao H, Jin L, Gu L, He B, He H Liu . Cgraph: a correlations-aware approach for efficient concurrent iterative graph processing. In: Proceedings of 2018 USENIX Conference on Usenix Annual Technical Conference. 2018, 441−452
12 J, Xue Z, Yang Z, Qu S, Hou Y Dai . Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 2014, 227−238
13 J, Xue Z, Yang S, Hou Y Dai . Processing concurrent graph analytics with decoupled computation model. IEEE Transactions on Computers, 2017, 66( 5): 876–890
14 J, Zhao Y, Zhang X, Liao L, He B, He H, Jin H, Liu Y Chen . GraphM: an efficient storage system for high throughput of concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2019, 3
15 Xu X, Wang F, Jiang H, Cheng Y, Feng D, Zhang Y, Fang P. GraphCP: an I/O-efficient concurrent graph processing framework. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service. 2021, 1−10
16 Z, Ai M, Zhang Y, Wu X, Qian K, Chen W Zheng . Squeezing out all the value of loaded data: an out-of-core graph processing system with reduced disk I/O. In: Proceedings of 2017 USENIX Conference on Usenix Annual Technical Conference. 2017, 125−137
17 R, Zhu K, Zhao H, Yang W, Lin C, Zhou B, Ai Y, Li J Zhou . AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment, 2019, 12( 12): 2094–2105
18 S, Maleki D, Nguyen A, Lenharth M, Garzarán D, Padua K Pingali . DSMR: a parallel algorithm for single-source shortest path problem. In: Proceedings of 2016 International Conference on Supercomputing. 2016, 32
19 X, Liao J, Zhao Y, Zhang B, He L, He H, Jin L Gu . A structure-aware storage optimization for out-of-core concurrent graph processing. IEEE Transactions on Computers, 2022, 71( 7): 1612–1625
20 H, Liu H H Huang . Graphene: fine-grained IO management for graph computing. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies. 2017, 285−299
21 Liu W, Liu H, Liao X, Jin H, Zhang Y. Straggler-aware parallel graph processing in hybrid memory systems. In: Proceedings of the 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing. 2021, 217−226
22 M, Agostini F, O’Brien T Abdelrahman . Balancing graph processing workloads using work stealing on heterogeneous CPU-FPGA systems. In: Proceedings of the 49th International Conference on Parallel Processing. 2020, 50
23 L G Valiant . A bridging model for parallel computation. Communications of the ACM, 1990, 33( 8): 103–111
24 L, Backstrom D, Huttenlocher J, Kleinberg X Lan . Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 44−54
25 H, Kwak C, Lee H, Park S Moon . What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. 2010, 591−600
26 P, Boldi S Vigna . The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 595−602
27 P, Boldi M, Santini S Vigna . A large time-aware web graph. ACM SIGIR Forum, 2008, 42( 2): 33–38
28 Zhou Z, Hoffmann H. GraphZ: improving the performance of large-scale graph analytics on small-scale machines. In: Proceedings of the 34th IEEE International Conference on Data Engineering. 2018, 1368−1371
29 H, Chen M, Shen N, Xiao Y Lu . Krill: a compiler and runtime system for concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2021, 51
30 Cheng J, Liu Q, Li Z, Fan W, Lui J C S, He C. VENUS: vertex-centric streamlined graph computation on a single PC. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 1131−1142
31 Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 409−420
32 K, Vora G, Xu R Gupta . Load the edges you need: a generic I/O optimization for disk-based graph processing. In: Proceedings of 2016 USENIX Conference on Usenix Annual Technical Conference. 2016, 507−522
33 X, Xu F, Wang H, Jiang Y, Cheng D, Feng Y Zhang . A hybrid update strategy for I/O-efficient out-of-core graph processing. IEEE Transactions on Parallel and Distributed Systems, 2020, 31( 8): 1767–1782
34 K K, Matam H, Hashemi M Annavaram . MultilogVC: efficient out-of-core graph processing framework for flash storage. In: Proceedings of 2021 IEEE International Parallel and Distributed Processing Symposium. 2021, 245−255
35 M, Zhang Y, Wu Y, Zhuo X, Qian C, Huan K Chen . Wonderland: a novel abstraction-based out-of-core graph processing system. ACM SIGPLAN Notices, 2018, 53( 2): 608–621
36 P, Pan C Li . Congra: towards efficient processing of concurrent graph queries on shared-memory machines. In: Proceedings of 2017 IEEE International Conference on Computer Design. 2017, 217−224
37 L, Zhou R, Chen Y, Xia R Teodorescu . C-graph: a highly efficient concurrent graph reachability query framework. In: Proceedings of the 47th International Conference on Parallel Processing. 2018, 79
38 J, Zhao Y, Zhang X, Liao L, He B, He H, Jin H Liu . LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2021, 45
[1] FCS-22361-OF-XX_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed