A disk I/O optimized system for concurrent graph processing jobs

doi:10.1007/s11704-023-2361-0

Front. Comput. Sci.

2024, Vol. 18

Issue (3) : 183105 https://doi.org/10.1007/s11704-023-2361-0

Architecture

A disk I/O optimized system for concurrent graph processing jobs

Xianghao XU^1,², Fang WANG², Hong JIANG³, Yongli CHENG^4,⁵(

), Dan FENG², Peng FANG²

¹. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
². Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China
³. Department of Computer Science & Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
⁴. College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China
⁵. Zhejiang Lab, Hangzhou 311121, China

Download: PDF(17413 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

In order to analyze and process the large graphs with high cost efficiency, researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer. On the other hand, with the rapidly growing need of analyzing graphs in the real-world, graph processing systems have to efficiently handle massive concurrent graph processing (CGP) jobs. Unfortunately, due to the inherent design for single graph processing job, existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs. In this paper, we propose GraphCP, a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs. GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices, which efficiently overcomes above challenges faced by existing out-of-core graph processing systems. Moreover, GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations. In addition, GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality. Extensive evaluation results show that GraphCP is 20.5× and 8.9× faster than two out-of-core graph processing systems GridGraph and GraphZ, and 3.5× and 1.7× faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.

Keywords graph processing disk I/O concurrent jobs

Corresponding Author(s): Yongli CHENG

Just Accepted Date: 10 February 2023 Issue Date: 26 April 2023

Cite this article:

Xianghao XU,Fang WANG,Hong JIANG, et al. A disk I/O optimized system for concurrent graph processing jobs[J]. Front. Comput. Sci., 2024, 18(3): 183105.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2361-0
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I3/183105

Fig.1 Example graph and the partitioned four edge blocks that reside on disk. (a) An example graph; (b) Edge blocks

Fig.2 The GraphCP architecture

Fig.3 Illustration of the Source-Sorted Sub-Block representation

Notation	Definition
V	The vertex set
E	The edge set
P	Number of intervals
J	Number of CGP jobs
A	Active vertex set in current iteration
M	Size of an edge structure value
N	Size of a vertex value
W	Size of an edge weight value
$S s e q$	Size of edges that are sequentially read
$S r a n$	Size of edges that are randomly read
$T r r$	Random read bandwidth
$T r w$	Random write bandwidth
$T s r$	Sequential read bandwidth
$T s w$	Sequential write bandwidth

Tab.1 Notations

Fig.4 One iteration of execution for GraphCP when the number of active vertices is small

Fig.5 Two consecutive iterations of execution for GraphCP when the number of active vertices is large

Tab.2 Datasets used in evaluation

Tab.3 Execution time (s)

Fig.6 Runtime breakdown on SK2005

Fig.7 I/O traffic comparison

Workload	Algorithms	Characteristic
Het	{PR, BFS, WCC, SSSP}	Heterogeneous
Hom1	{BFS} $×$ 4	Homogeneous
Hom2	{SSSP} $×$ 4	Homogeneous
Sim1	{BFS, SSSP} $×$ 2	Similar Algorithms
Sim2	{PR, WCC} $×$ 2	Similar Algorithms

Tab.4 Workloads of the CGP Jobs

Fig.8 Performance comparisons on different workloads. (a) Twitter2010; (b) UK2007

Fig.9 Preprocessing time comparison

Fig.10 Comparison of the performance for different I/O access models

Tab.5 Extra computation overheads vs. Reduced I/O time overheads

Fig.11 Comparison of execution time in each iteration for different I/O access models (From 1 to 15 iteration)

Fig.12 Performance comparison of GraphCP with/without the DFVU model. (a) Comparison of execution time; (b) comparison of I/O traffic

Fig.13 Performance variation with different number of threads. (a) LiveJournal; (b) SK2005

Fig.14 Performance variation on different I/O devices

Fig.15 Performance variation with different number of jobs

1	G, Malewicz M H, Austern A J C, Bik J C, Dehnert I, Horn N, Leiser G Czajkowski . Pregel: a system for large-scale graph processing. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135−146
2	Y, Low D, Bickson J, Gonzalez C, Guestrin A, Kyrola J M Hellerstein . Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5( 8): 716–727
3	J E, Gonzalez Y, Low H, Gu D, Bickson C Guestrin . PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17−30
4	X, Zhu W, Chen W, Zheng X Ma . Gemini: a computation-centric distributed graph processing system. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. 2016, 301−316
5	A, Kyrola G, Blelloch C Guestrin . GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 31−46
6	Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472−488
7	X, Zhu W, Han W Chen . GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of 2015 USENIX Conference on Usenix Annual Technical Conference. 2015, 375−386
8	K Vora . LUMOS: dependency-driven disk-based graph processing. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 429−442
9	R, Chen J, Shi Y, Chen B, Zang H, Guan H Chen . PowerLyra: differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing, 2019, 5( 3): 13
10	Y, Cheng F, Wang H, Jiang Y, Hua D, Feng L, Zhang J Zhou . A communication-reduced and computation-balanced framework for fast graph computation. Frontiers of Computer Science, 2018, 12( 5): 887–907
11	Y, Zhang X, Liao H, Jin L, Gu L, He B, He H Liu . Cgraph: a correlations-aware approach for efficient concurrent iterative graph processing. In: Proceedings of 2018 USENIX Conference on Usenix Annual Technical Conference. 2018, 441−452
12	J, Xue Z, Yang Z, Qu S, Hou Y Dai . Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 2014, 227−238
13	J, Xue Z, Yang S, Hou Y Dai . Processing concurrent graph analytics with decoupled computation model. IEEE Transactions on Computers, 2017, 66( 5): 876–890
14	J, Zhao Y, Zhang X, Liao L, He B, He H, Jin H, Liu Y Chen . GraphM: an efficient storage system for high throughput of concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2019, 3
15	Xu X, Wang F, Jiang H, Cheng Y, Feng D, Zhang Y, Fang P. GraphCP: an I/O-efficient concurrent graph processing framework. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service. 2021, 1−10
16	Z, Ai M, Zhang Y, Wu X, Qian K, Chen W Zheng . Squeezing out all the value of loaded data: an out-of-core graph processing system with reduced disk I/O. In: Proceedings of 2017 USENIX Conference on Usenix Annual Technical Conference. 2017, 125−137
17	R, Zhu K, Zhao H, Yang W, Lin C, Zhou B, Ai Y, Li J Zhou . AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment, 2019, 12( 12): 2094–2105
18	S, Maleki D, Nguyen A, Lenharth M, Garzarán D, Padua K Pingali . DSMR: a parallel algorithm for single-source shortest path problem. In: Proceedings of 2016 International Conference on Supercomputing. 2016, 32
19	X, Liao J, Zhao Y, Zhang B, He L, He H, Jin L Gu . A structure-aware storage optimization for out-of-core concurrent graph processing. IEEE Transactions on Computers, 2022, 71( 7): 1612–1625
20	H, Liu H H Huang . Graphene: fine-grained IO management for graph computing. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies. 2017, 285−299
21	Liu W, Liu H, Liao X, Jin H, Zhang Y. Straggler-aware parallel graph processing in hybrid memory systems. In: Proceedings of the 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing. 2021, 217−226
22	M, Agostini F, O’Brien T Abdelrahman . Balancing graph processing workloads using work stealing on heterogeneous CPU-FPGA systems. In: Proceedings of the 49th International Conference on Parallel Processing. 2020, 50
23	L G Valiant . A bridging model for parallel computation. Communications of the ACM, 1990, 33( 8): 103–111
24	L, Backstrom D, Huttenlocher J, Kleinberg X Lan . Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 44−54
25	H, Kwak C, Lee H, Park S Moon . What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. 2010, 591−600
26	P, Boldi S Vigna . The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 595−602
27	P, Boldi M, Santini S Vigna . A large time-aware web graph. ACM SIGIR Forum, 2008, 42( 2): 33–38
28	Zhou Z, Hoffmann H. GraphZ: improving the performance of large-scale graph analytics on small-scale machines. In: Proceedings of the 34th IEEE International Conference on Data Engineering. 2018, 1368−1371
29	H, Chen M, Shen N, Xiao Y Lu . Krill: a compiler and runtime system for concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2021, 51
30	Cheng J, Liu Q, Li Z, Fan W, Lui J C S, He C. VENUS: vertex-centric streamlined graph computation on a single PC. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 1131−1142
31	Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: an efficient graph processing system on a single machine. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 409−420
32	K, Vora G, Xu R Gupta . Load the edges you need: a generic I/O optimization for disk-based graph processing. In: Proceedings of 2016 USENIX Conference on Usenix Annual Technical Conference. 2016, 507−522
33	X, Xu F, Wang H, Jiang Y, Cheng D, Feng Y Zhang . A hybrid update strategy for I/O-efficient out-of-core graph processing. IEEE Transactions on Parallel and Distributed Systems, 2020, 31( 8): 1767–1782
34	K K, Matam H, Hashemi M Annavaram . MultilogVC: efficient out-of-core graph processing framework for flash storage. In: Proceedings of 2021 IEEE International Parallel and Distributed Processing Symposium. 2021, 245−255
35	M, Zhang Y, Wu Y, Zhuo X, Qian C, Huan K Chen . Wonderland: a novel abstraction-based out-of-core graph processing system. ACM SIGPLAN Notices, 2018, 53( 2): 608–621
36	P, Pan C Li . Congra: towards efficient processing of concurrent graph queries on shared-memory machines. In: Proceedings of 2017 IEEE International Conference on Computer Design. 2017, 217−224
37	L, Zhou R, Chen Y, Xia R Teodorescu . C-graph: a highly efficient concurrent graph reachability query framework. In: Proceedings of the 47th International Conference on Parallel Processing. 2018, 79
38	J, Zhao Y, Zhang X, Liao L, He B, He H, Jin H Liu . LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. 2021, 45

[1]

FCS-22361-OF-XX_suppl_1

Download

[1]	Hongru GAO, Xiaofei LIAO, Zhiyuan SHAO, Kexin LI, Jiajie CHEN, Hai JIN. A survey on dynamic graph processing on GPUs: concepts, terminologies and systems[J]. Front. Comput. Sci., 2024, 18(4): 184106-.
[2]	Xinqiao LV, Wei XIAO, Yu ZHANG, Xiaofei LIAO, Hai JIN, Qiangsheng HUA. An effective framework for asynchronous incremental graph processing[J]. Front. Comput. Sci., 2019, 13(3): 539-551.
[3]	Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.

Viewed

Full text

Abstract

Cited

Shared

Discussed