Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (1) : 181601    https://doi.org/10.1007/s11704-022-2368-y
Information Systems
Answering reachability queries with ordered label constraints over labeled graphs
Daoliang HE, Pingpeng YUAN(), Hai JIN
National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
 Download: PDF(4670 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Reachability query plays a vital role in many graph analysis tasks. Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs. Since many real graphs are labeled graph, it highly demands Label-Constrained Reachability (LCR) query in which constraint includes a set of labels besides vertex pairs. Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path. Besides that constraint may be a label set, query constraint may be ordered labels, namely OLCR (Ordered-Label-Constrained Reachability) queries which retrieve paths matching a sequence of labels. Currently, no solutions are available for OLCR. Here, we propose DHL, a novel bloom filter based indexing technique for answering OLCR queries. DHL can be used to check reachability between vertex pairs. If the answers are not no, then constrained DFS is performed. So, we employ DHL followed by performing constrained DFS to answer OLCR queries. We show that DHL has a bounded false positive rate, and it’s powerful in saving indexing time and space. Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8–22.5 times smaller index space and 4.6–114 times less index construction time than two state-of-art techniques for LCR queries, while achieving comparable query response time. The results also show that our algorithm can answer OLCR queries effectively.

Keywords graph      ordered-label-constrained reachability      partial index      bloom filter      query processing     
Corresponding Author(s): Pingpeng YUAN   
About author:

Changjian Wang and Zhiying Yang contributed equally to this work.

Just Accepted Date: 07 December 2022   Issue Date: 27 February 2023
 Cite this article:   
Daoliang HE,Pingpeng YUAN,Hai JIN. Answering reachability queries with ordered label constraints over labeled graphs[J]. Front. Comput. Sci., 2024, 18(1): 181601.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2368-y
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I1/181601
Fig.1  A toy graph G0 and the state(v) for query reach(v0,v6,a?b?c?). (a) A toy graph G0; (b) state(v) with DFS travesal
  
Fig.2  A toy graph G and the corresponding permutation. (a) An edge-labeled graph G; (b) A permutation of h1(v) and h2(l) for vG and l ∈ Σ
v Lout(v) Lin(v)
v1 (1,l1),2,(2,l1l2),(3,l1) (4,l1l2l4),(5,l1) 2
v2 (1,l2),(2,l2),3,(4,l2l4)(5,l1),(5,l2l3) (2,l1),3
v3 1,(2,l2l3),(4,l2?l4)(5,l2),(5,l3) 1,(2,l1),(3,l4)
v4 1,(5,l3) 1,(2,l1l2),(3,l2)
v5 5 (1,l3),(2,l1)(3,l1),(3,l3l4),5
v6 (2,l3),(4,l3l4),5 (1,l2),(2,l1l2),(3,l1l2)(3,l2l4),5,(5,l2)
v7 2,(4,l4) (1,l2l3),2,(2,l1?l3)(3,l2),(5,l3)
v8 4 (1,l2),(2,l4),(3,l2l4)(3,l1-l3),4,(5,l2l3),(5,l3l4)
v9 (1,l4),(1,l1l3),(2,l1?l3)(2,l2?l4),3,(4,l1?l3)(4,l2?l4),(5,l1) (5,l2l4),(5,l3l4) 3
v10 (1,l3),(2,l2l3),(4,l2l3)5,(5,l2) (3,l1),5
v11 1,(4,l2) 1,(3,l1l3),(5,l3)
Tab.1  Reduced index entries for G
  
  
  
Dataset Abbr. |V| |E| |Σ| labels Synthetic labels
Robots RT 1.2K 2.4K 4
Advogato ADG 5.4K 28K 4
arXiv AX 34K 417K 8
BioGrid BG 64K 489K 7
StringsAM

See string-db.org website.

SAM 6.1K 755K 7
Youtube YT 15K 3.5M 5
StringsFC SFC 19K 4.4M 7
webStanford WS 281K 1.2M 8
WebGoogle WG 916K 3.1M 8
CentralUSA

See konect.cc/networks/ website.

CU 14M 17M 8
FullUSA

See konect.cc/networks/ website.

FU 24M 29M 8
Tab.2  Real datasets
Dataset LI+ P2H+ DHL (K=5)
IT IS IT IS IT IS
RT 0.03 1.1 0.01 0.72 0.01 0.15
ADG 3.4 49 0.74 14 0.16 2.4
AX 76 240 306 555 4.0 36.5
BG 214 560 65 321 2.5 24.8
SAM 99 107 37 56 4.1 6.8
YT 1.4k 435 168 150 12.3 13.2
SFC 836 426 1.5k 467 29.7 26.7
WS 2.1k 4.3k 1.3k 2.4k 22.8 348
WG ? ? ? ? 64 1.3k
CU ? ? ? ? 96 3.7k
Tab.3  IT in seconds and IS in megabytes. In this table, “k” indicates 103, and “?” means timeout
Name |Σ|/4 or 2 |Σ|/2 or 4 |Σ|?2 or 6
LI+ P2H+ DHL LI+ P2H+ DHL LI+ P2H+ DHL
RT 0.1 0.2 0.03 0.1 0.17 0.06
ADG 1.05 0.68 0.62 0.45 0.53 0.7
AX 86 2.6 7.4 105 1.8 3.9 191 1.1 1.8
BG 82 1.4 4.9 137 1.0 4.0 181 0.8 1.8
SAM 21.4 0.89 5.9 8.7 0.96 26 21.5 0.76 8.0
YT 106 0.86 90 69 0.91 68
SFC 252 3.4 68 437 2.8 37 275 3.0 39
WS 14 1.3 0.03 123 1.3 0.27 798 1.64 1.28
WG ? ? 0.04 ? ? 0.37 ? ? 27.8
CU ? ? 0.01 ? ? 0.01 ? ? 0.02
Tab.4  QT*(us) for LCR of LI+, P2H+ and DHL with false query sets in real datasets. In this table, “?” means timeout, the blank means the query set doesn’t exist
Name |Σ|/4 or 2 |Σ|/2 or 4 |Σ|?2 or 6
LI+ P2H+ DHL LI+ P2H+ DHL LI+ P2H+ DHL
RT 0.1 0.09 0.02 0.1 0.23 0.04
ADG 0.25 0.68 0.19 0.32 0.64 0.47
AX 6.6 2.7 2.4 27 2.9 5.7 44 2.5 7.8
BG 6.1 0.9 1.5 15.7 1.3 5.0 26 1.3 5.9
SAM 9.4 1.1 11 13.5 1.3 22 14.7 1.4 23
YT 13.3 0.9 65 16.5 1.2 63
SFC 84.2 5.2 79 107 4.4 80 116 4.6 77
WS 1.7 1.4 0.04 15.8 1.5 0.11 76.5 1.7 0.71
WG ? ? 0.04 ? ? 0.06 ? ? 7.1
CU ? ? 0.03 ? ? 0.03 ? ? 0.03
Tab.5  QT(us) for LCR of LI+, P2H+ and DHL with random query sets in real datasets. In this table, “?” means timeout, the blank means the query set doesn’t exist
Model D LI+ P2H+ DHL QT/μs
IT (s) IS (MB) IT (s) IS (MB) IT (s) RF IS (MB) RF LI+ P2H+ DHL
2 20.8 207 12.4 146 0.43 28 19.3 7.6 7.6 0.78 1.22
PA 3 90.6 420 72.1 347 0.73 98 24.7 14 26.8 3.74 3.37
4 125 466 89.7 418 0.82 109 22.7 18 26.5 3.45 6.2
5 159 468 123 465 0.92 134 22.3 21 73.6 4.47 9.96
2 39.4 162 66.7 312 0.49 136 20.8 15 3.8 2.78 0.12
ER 3 86.5 287 383 806 0.86 445 30.5 26 24.3 7.93 1.51
4 114 388 1078 1329 1.17 921 34.6 38 66.2 12.3 2.31
5 127 446 1964 1751 1.33 1477 33 53 84.1 15.5 3.22
Tab.6  In this experiment, n = 25,000 and L = 8, the node degree varies from 2 to 5. RF denotes the reduction factor against P2H+
Fig.3  IT, IS and QT in PA-datasets. n=25,000, |Σ| varies from 2,4 to 8, the node degree D is either 2, 3, 4, or 5. (a) Indexing time/s; (b) Index size/MB; (c) Query time/μs
Fig.4  IT, IS and QT in ER-datasets. n =25,000, |Σ| varies from 2,4 to 8, the node degree D is either 2, 3, 4, or 5. (a) Indexing time/s; (b) Index size/MB; (c) Query time/μs
Fig.5  IT, IS and QT in PA-datasets. n varies from 5K, 20K, 80K to 320K. The lines represent different algorithms. (a) Indexing time/s; (b) Index size/MB; (c) Query time/μs
Fig.6  IT, IS and QT in PA-datasets. n varies from 5K, 20K, 80K to 320K. The lines represent different algorithms. (a) Indexing time/s; (b) Index size/MB; (c) Query time/μs
Name |Σ|/4 or 2 |Σ|/2 or 4 |Σ|?2 or 6
LI+ P2H+ DHL+ LI+ P2H+ DHL+ LI+ P2H+ DHL+
F R F R F R F R F R F R F R F R F R
RT 0.47 0.45 0.89 0.41 0.14 0.09 0.29 36 0.42 79 0.15 14.9
ADG 3.2 6.9 2.1 15.3 1.9 5.6 1.2 1.3ms 1.4 2.5ms 1.8 1.9ms
AX 93 ? 4.1 705ms 8.4 625ms 125 ? 2.9 ? 4.9 ? 287 ? 2.2 ? 2.8 ?
BG 90 131ms 2.4 25ms 5.7 32ms 845 ? 8.1 ? 25 ? 1.1ms ? 6.7 ? 11.5 ?
SAM 27 1.2ms 1.9 210 6.9 1.4ms 12 ? 1.6 ? 29 ? 25 ? 1.2 ? 9.1 ?
YT 115 ? 1.4 423ms 90 ? 174 ? 3.1 ? 156 ?
SFC 265 ? 4.3 ? 68 ? 446 ? 3.9 ? 37 ? 287 ? 4.5 ? 39 ?
WS 98 9.9 11 8.3 0.21 0.23 196 45 2.7 4.2 0.43 0.31 872 ? 2.3 49ms 1.4 15.8ms
WG ? ? ? ? 0.28 0.24 ? ? ? ? 0.62 0.27 ? ? ? ? 27.8 19.2
Tab.7  Query time(us) for DHL+ to answer OLCR queries in real datasets. In this table, “?” means timeout, and the blank means the query set doesn’t exist. “F” denotes false and “R” denotes random
  
  
  
1 R, Jin H, Hong H, Wang N, Ruan Y Xiang . Computing label-constraint reachability in graph databases. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data. 2010, 123−134
2 J, Zhu Z, Nie X, Liu B, Zhang J Wen . Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 101−110
3 C, Yang G, Liu C, Yan C Jiang . A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection. Science China Information Sciences, 2021, 64( 12): 222101
4 J, Cheng S, Huang H, Wu A W C Fu . Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In: Proceedings of 2013 ACM SIGMOD International Conference on Management of Data. 2013, 193−204
5 E, Cohen E, Halperin H, Kaplan U Zwick . Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32( 5): 1338–1355
6 R, Jin R, Ning S, Dey J Y Xu . SCARAB: scaling reachability computation on large graphs. In: Proceedings of 2012 ACM SIGMOD International Conference on Management of Data. 2012, 169−180
7 R, Jin Y, Xiang N, Ruan H Wang . Efficiently answering reachability queries on very large directed graphs. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 595−608
8 L, Li W, Hua X Zhou . HD-GDD: high dimensional graph dominance drawing approach for reachability query. World Wide Web, 2017, 20( 4): 677–696
9 J, Su Q, Zhu H, Wei J X Yu . Reachability querying: can it be even faster?. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 3): 683–697
10 H, Wei J X, Yu C, Lu R Jin . Reachability querying: an independent permutation labeling approach. Proceedings of the VLDB Endowment, 2014, 7( 12): 1191–1202
11 H, Wei J X, Yu C, Lu R Jin . Reachability querying: an independent permutation labeling approach. The VLDB Journal, 2018, 27( 1): 1–26
12 P, Yuan Y, You S, Zhou H, Jin L Liu . Providing fast reachability query services with MGTag: a multi-dimensional graph labeling method. IEEE Transactions on Services Computing, 2022, 15( 2): 1000–1011
13 P, Holme J Saramäki . Temporal networks. Physics Reports, 2012, 519( 3): 97–125
14 C, Shi Y, Li J, Zhang Y, Sun P S Yu . A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 1): 17–37
15 Y, Peng Y, Zhang X, Lin L, Qin W Zhang . Answering billion-scale label-constrained reachability queries within microsecond. Proceedings of the VLDB Endowment, 2020, 13( 6): 812–825
16 L, Zou K, Xu J X, Yu L, Chen Y, Xiao D Zhao . Efficient processing of label-constraint reachability queries in large graphs. Information Systems, 2014, 40: 47–66
17 L D J, Valstar G H L, Fletcher Y Yoshida . Landmark indexing for evaluation of label-constrained reachability queries. In: Proceedings of 2017 ACM International Conference on Management of Data. 2017, 345−358
18 R, Jin N, Ruan Y, Xiang H Wang . Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Transactions on Database Systems, 2011, 36( 1): 7
19 Y, Chen Y Chen . An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 893−902
20 Schaik S J, Van Moor O De . A memory efficient reachability data structure through bit vector compression. In: Proceedings of 2011 ACM SIGMOD International Conference on Management of Data. 2011, 913−924
21 H, Wang H, He J, Yang P S, Yu J X Yu . Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). 2006, 75
22 Y Chen . General spanning trees and reachability query evaluation. In: Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering. 2009, 243−252
23 R, Jin Y, Xiang N, Ruan D Fuhry . 3-HOP: a high-compression indexing scheme for reachability query. In: Proceedings of 2009 ACM SIGMOD International Conference on Management of Data. 2009, 813−826
24 R, Jin G Wang . Simple, fast, and scalable reachability oracle. Proceedings of the VLDB Endowment, 2013, 6( 14): 1978–1989
25 R, Schenkel A, Theobald G Weikum . HOPI: an efficient connection index for complex xml document collections. In: Proceedings of the 9th International Conference on Extending Database Technology. 2004, 237−255
26 J, Cheng J X, Yu X, Lin H, Wang S Y Philip . Fast computation of reachability labeling for large graphs. In: Proceedings of the 10th International Conference on Extending Database Technology. 2006, 961−979
27 J, Cai C K Poon . Path-hop: efficiently indexing large graphs for reachability queries. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 119−128
28 A D, Zhu W, Lin S, Wang X Xiao . Reachability queries on large dynamic graphs: a total order approach. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. 2014, 1323−1334
29 S, TrßI U Leser . Fast and practical indexing and querying of very large graphs. In: Proceedings of 2007 ACM SIGMOD International Conference on Management of Data. 2007, 845−856
30 H, Yildirim V, Chaoji M J Zaki . Grail: scalable reachability index for large graphs. Proceedings of the VLDB Endowment, 2010, 3( 1−2): 276–284
31 H, Yıldırım V, Chaoji M J Zaki . Grail: a scalable index for reachability queries in very large graphs. The VLDB Journal, 2012, 21( 4): 509–534
32 S, Seufert A, Anand S, Bedathur G Weikum . FERRARI: flexible and efficient reachability range assignment for graph indexing. In: Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE). 2013, 1009−1020
33 G D, Battista P, Eades R, Tamassia I G Tollis . Graph Drawing: Algorithms for the Visualization of Graphs. New Jersey: Prentice Hall, 1999
34 R R, Veloso L, Cerf W Jr, Meira M J Zaki . Reachability queries in very large graphs: a fast refined online search approach. In: Proceedings of the 17th International Conference on Extending Database Technology. 2014, 511−522
35 N, Sengupta A, Bagchi M, Ramanath S Bedathur . ARROW: approximating reachability using random walks over web-scale graphs. In: Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE). 2019, 470−481
36 S, Wadhwa A, Prasad S, Ranu A, Bagchi S Bedathur . Efficiently answering regular simple path queries on large labeled networks. In: Proceedings of 2019 International Conference on Management of Data. 2019, 1463−1480
37 B H Bloom . Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13( 7): 422–426
38 L, Luo D, Guo R T B, Ma O, Rottenstreich X Luo . Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials, 2019, 21( 2): 1912–1949
[1] FCS-22368-OF-DH_suppl_1 Download
[1] Jingyu LIU, Shi CHEN, Li SHEN. A comprehensive survey on graph neural network accelerators[J]. Front. Comput. Sci., 2025, 19(2): 192104-.
[2] Tianming ZHANG, Jie ZHAO, Cibo YU, Lu CHEN, Yunjun GAO, Bin CAO, Jing FAN, Ge YU. Labeling-based centrality approaches for identifying critical edges on temporal graphs[J]. Front. Comput. Sci., 2025, 19(1): 191601-.
[3] Yuqian MA, Wenbo SHI, Xinghua LI, Qingfeng CHENG. Provable secure authentication key agreement for wireless body area networks[J]. Front. Comput. Sci., 2024, 18(5): 185811-.
[4] Jiajia CHEN, Jiancan WU, Jiawei CHEN, Xin XIN, Yong LI, Xiangnan HE. How graph convolutions amplify popularity bias for recommendation?[J]. Front. Comput. Sci., 2024, 18(5): 185603-.
[5] Zizhang WU, Yuanzhu GAN, Tianhao XU, Fan WANG. Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation[J]. Front. Comput. Sci., 2024, 18(5): 185327-.
[6] Hongru GAO, Xiaofei LIAO, Zhiyuan SHAO, Kexin LI, Jiajie CHEN, Hai JIN. A survey on dynamic graph processing on GPUs: concepts, terminologies and systems[J]. Front. Comput. Sci., 2024, 18(4): 184106-.
[7] Junfei TANG, Ran SONG, Yuxin HUANG, Shengxiang GAO, Zhengtao YU. Semantic-aware entity alignment for low resource language knowledge graph[J]. Front. Comput. Sci., 2024, 18(4): 184319-.
[8] Jingbin WANG, Weijie ZHANG, Zhiyong YU, Fangwan HUANG, Weiping ZHU, Longbiao CHEN. Route selection for opportunity-sensing and prediction of waterlogging[J]. Front. Comput. Sci., 2024, 18(4): 184503-.
[9] Rui HE, Zehua FU, Qingjie LIU, Yunhong WANG, Xunxun CHEN. Learning group interaction for sports video understanding from a perspective of athlete[J]. Front. Comput. Sci., 2024, 18(4): 184705-.
[10] Zhe YUAN, Zhewei WEI, Fangrui LV, Ji-Rong WEN. Index-free triangle-based graph local clustering[J]. Front. Comput. Sci., 2024, 18(3): 183404-.
[11] Xianghao XU, Fang WANG, Hong JIANG, Yongli CHENG, Dan FENG, Peng FANG. A disk I/O optimized system for concurrent graph processing jobs[J]. Front. Comput. Sci., 2024, 18(3): 183105-.
[12] Hengyu LIU, Tiancheng ZHANG, Fan LI, Minghe YU, Ge YU. A probabilistic generative model for tracking multi-knowledge concept mastery probability[J]. Front. Comput. Sci., 2024, 18(3): 183602-.
[13] Qi LIU, Qinghua ZHANG, Fan ZHAO, Guoyin WANG. Uncertain knowledge graph embedding: an effective method combining multi-relation and multi-path[J]. Front. Comput. Sci., 2024, 18(3): 183311-.
[14] Jiaqi LIU, Zhiwen YU, Bin GUO, Cheng DENG, Luoyi FU, Xinbing WANG, Chenghu ZHOU. EvolveKG: a general framework to learn evolving knowledge graphs[J]. Front. Comput. Sci., 2024, 18(3): 183309-.
[15] Yuanjing HAO, Long LI, Liang CHANG, Tianlong GU. MLDA: a multi-level k-degree anonymity scheme on directed social network graphs[J]. Front. Comput. Sci., 2024, 18(2): 182814-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed