Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2022, Vol. 16 Issue (3) : 163606    https://doi.org/10.1007/s11704-020-0360-y
RESEARCH ARTICLE
A subgraph matching algorithm based on subgraph index for knowledge graph
Yunhao SUN1, Guanyu LI1(), Jingjing DU1, Bo NING1, Heng CHEN1,2
1. Faculty of Information Science and Technology, Dalian Maritime University, Liaoning 116026, China
2. Faculty of Software, Dalian University of Foreign Languages, Liaoning 116044, China
 Download: PDF(12660 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of applications including question answering and semantic search. In this paper, we study the problem of subgraph matching on knowledge graph. Specifically, given a query graph q and a data graph G, the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G. Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph. To accelerate subgraph matching on knowledge graph, we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph, called as F G q T-Match. The subgraph matching algorithm consists of two key designs. One design is a subgraph index of matching-driven flow graph ( F G q T), which reduces redundant calculations in advance. Another design is a multi-label weight matrix, which evaluates a near-optimal matching tree for minimizing the intermediate candidates. With the aid of these two key designs, all subgraph isomorphic mappings are quickly conducted only by traversing F G q T. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

Keywords knowledge graph      subgraph matching      subgraph index      matching tree     
Corresponding Author(s): Guanyu LI   
Just Accepted Date: 29 September 2020   Issue Date: 18 September 2021
 Cite this article:   
Yunhao SUN,Guanyu LI,Jingjing DU, et al. A subgraph matching algorithm based on subgraph index for knowledge graph[J]. Front. Comput. Sci., 2022, 16(3): 163606.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0360-y
https://academic.hep.com.cn/fcs/EN/Y2022/V16/I3/163606
Fig.1  RDF-modeled graphs. (a) Query graph; (b) Data graph
Fig.2  Knowledge multi-graphs. (a) Query multi-graph; (b) Data multi-graph
Fig.3  Data knowledge multi-graph
Fig.4  Subgraph matching. (a) Query graph q; (b) Data graph G
Notations Meanings
q = ( V q, E q, L) a directed query multi-graph with V q, edge set E q and labeling function L
G = ( V, E, L) a directed data multi-graph with vertex set V, edge set E and labeling function L
q T = ( V T, E T) a matching tree of q with node set V T and edge set E T
F G q T = ( r, V F, E F, L F) a matching-driven flow graph of q on G with a root r, vertex set V F, edge set E F and labeling function L F
? v , u ? a query-data vertex pair with v V and u V q
M a subgraph isomorphic mapping of q on G,
M = { ? v 1 , u 1 ? , ? v 2 , u 2 ? , , ? v n , u n ? }
M ~ i a partial subgraph isomorphic mapping
M ~ i = { ? v 1 , u 1 ? , ? v 2 , u 2 ? , , ? v i , u i ? }, M ~ i ? M
M a set of subgraph isomorphic mappings Ms of q on G
C ( u ) a candidate set of u, u V q
C R ( u , v ) a candidate region of u and adjacent to v, u V q, v V
Tab.1  Notations and meanings
Fig.5  Compact path index
Semantic Dictionary Data Dictionary of G
University A U 1 v 1
GraduatedStudent B P 1 v 2
Student C Data Storage of Vertices
hasName D v 1 { A }
GraduatedFrom a v 2 { B, C }, { ? D , L i l y ? }
Data Storage of Edges
e ( v 2, v 1) a
Tab.2  Semantic and data dictionaries
Fig.6  Our knowledge graph. (a) Query graph; (b) Data graph
Fig.7  
Fig.8  Knowledge multi-graphs. (a) Query multi-graph q; (b) Matching tree q T ; (c) Data multi-graph G
Fig.9  Top-down construction of F G q T
Fig.10  
Fig.11  
Fig.12  Bottom-up refinement of F G q T
Fig.13  
Fig.14   F G q T induced by a matching order ( u 4 , u 3 , u 2 , u 1 )
Fig.15  Weight matrix with its computed matrixes. (a) Matrix P; (b) Matrix E; (c) Matrix T; (d) Matrix W
Fig.16  Knowledge multi-graphs. (a) Data multi-graph; (b) Query multi-graph
Fig.17  A spanning tree
Fig.18  A path-merged spanning tree
Fig.19  A matching tree
Fig.20  
Methods Index Structures Matching Order
TurboHOM Adjacency List Infrequent Node First
CFLMatch Compact Path Index Infrequent Path First
F G q T-Match Subgraph Index Cost-Balanced Matching Tree
Tab.3  Compared algorithms
Parameters Dimensions
Query Size 10 15 20 25 30
Data Size 10 4 5 × 10 4 10 5 5 × 10 5 10 6
Density 0.01 0.1 1 10.0 100
Tab.4  Experimental parameters
Fig.21  Candidate and edge counts of Q T 5 queryset on DBpedia dataset. (a) Candidate count; (b) Edge count
Fig.22  Matching time of Q N T 5 queryset on DBpedia dataset. (a) Total matching time; (b) Matching time
Fig.23  Proportions of edges to vertices ( E F/ V F) in F G q T
Fig.24  Candidate sizes of Q N T querysets on DBpedia and Yago datasets. (a) Candidate ratio on DBpedia; (b) Candidate ratio on Yago
Fig.25  Total matching time on different scales of DBpedia subset
Fig.26  Total time-efficiencies of Q N T querysets on DBpedia and Yago datasets. (a) Tota matching time on DBpedia; (b) Total matching time on Yago
1 S Hu , L Zou , J X Yu , H Wang , D Zhao . Answering natural language questions by subgraph matching over knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 5): 824– 837
2 Xu Q, Wang X, Li J, Gan Y, Chai L, Wang J. StarMR: an efficient star-decomposition based query processor for SPARQL basic graph patterns using MapReduce. In: proceedings of Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. 2018, 415-430
3 T Cai , J Li , A S Mian , T Sellis , J X Yu . Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering, 2020,
4 Shekhar S, Xiong H, Zhou X. Encyclopedia of GIS: Resource Description Framework(RDF). 1st ed. Cham: Springer International Publishing, 2017
5 Garey M R, Johnson D S. Computers and Intractability: A Guide to the Theory of NP-Completeness. 1st ed. New York: W. H. Freeman, 1979
6 J Kim , H Shin , W H Han , S Hong , H Chafi . Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8( 11): 1238– 1249
7 Ingalalli V, Ienco D, Poncelet P, Villata S. Querying RDF data using a multigraph-based approach. In: Proceedings of the 19th International Conference on Extending Database Technology. 2016, 245-256
8 H Ma , M A Langouri , Y Wu , F Chiang , J Pi . Ontology-based entity matching in attributed graphs. Proceedings of the VLDB Endowment, 2019, 12( 10): 1195– 1207
9 L P Cordella , P Foggia , C Sansone , M Vento . A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 26( 10): 1367– 1372
10 He H, Singh A K. Graphs-at-a-time: query language and accessmethods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 405-418
11 P Zhao , J Han . On graph query optimization in large networks. Proceedings of the VLDB Endowment, 2010, 3( 1): 340– 351
12 Han W, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 337–348
13 Bi F, Chang L, Lin X, Qin L, Zhang W. Efficient subgraph matching by postponing cartesian products. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2016, 1199−1214
14 H Shang , Y Zhang , X Lin , J X Yu . Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the VLDB Endowment, 2008, 1( 1): 364– 375
15 Kim K, Seo I, Han W S, Hong S, Chafi H, Shin H, Jeong G. Turboflux: A fast continuous subgraph matching system for streaming graph data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2018, 411-426
16 J R Ullmann . An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23( 1): 31– 42
17 Jin X, Lai L. MPMatch: A Multi-core Parallel Subgraph Matching Algorithm. In: Proceedings of IEEE 35th International Conference on Data Engineering Workshops. 2019, 241−248
18 Bhattarai B, Liu H, Huang H. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2019, 1447−1462
19 P Peng , L Zou , Z Du , D Zhao . Using partial evaluation in holistic subgraph search. Frontiers of Computer Science, 2017, 12( 5): 966– 983
20 Y Ma , Y Yuan , M Liu , G Wang , Y Wang . Graph simulation on large scale temporal graphs. GeoInformatica, 2020, 24( 1): 199– 220
21 P Lin , Q Song , Y Wu . Fact checking in knowledge graphs with ontological subgraph patterns. Data Science and Engineering, 2018, 3 : 341– 358
22 Xu Y, Tong Y, Shi Y, Tao Q, Xu Ke, Li W. An Efficient Insertion Operator in Dynamic RideSharing Services. In: Proceedings of IEEE 35th International Conference on Data Engineering. 2019, 1022−1033
23 L Zou , M T Özsu , L Chen , X Shen , R Huang , D Zhao . gStore: a graph-based SPARQL query engine. The VLDB Journal, 2014, 23( 4): 565– 590
24 L Zeng , L Zou . Redesign of the gStore system. Frontiers of Computer science, 2018, 12( 4): 1– 19
25 X Wang , Le Chai , Q Xu , Y Yang , J Li , J Wang , Y Chai . Efficient subgraph matching on large RDF graphs using MapReduce. Data Science and Engineering, 2019, 4 : 24– 43
26 Q Xu , X Wang , J Li , Q Zhang , L Chai . Distributed subgraph matching on big knowledge graphs using pregel. IEEE Access, 2019, 7 : 116453– 116464
27 Malewicz G, Austern M H, Bik, A J C, Dehnert J C. Pregel: A system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 135−146
28 J Li , T Cai , K Deng , X Wang , T Sellis , F Xia . Community-diversified influence maximization in social networks. Information Systems, 2020, 92 : 101522–
29 Y Ma , Y Yuan , G Wang , X Bi , Z Wang , Y Wang . Rising star evaluation based on extreme learning machine in geo-social networks. Cognitive Computation, 2020, 12( 1): 296– 308
30 Wang Y, Tong Y, Long C, Xu P, Xu K, Lv W. Adaptive dynamic bipartite graph matching: a reinforcement learning approach. In: Proceedings of IEEE 35th International Conference on Data Engineering, 2019, 1478−1489
31 W Zheng , L Zou , W Peng , X Yan , S Song , D Zhao . Semantic SPARQL similarity search over RDF knowledge graphs. Proceedings of the VLDB Endowment, 2016, 9( 11): 840– 851
[1] Yongchun ZHU, Fuzhen ZHUANG, Xiangliang ZHANG, Zhiyuan QI, Zhiping SHI, Juan CAO, Qing HE. Combat data shift in few-shot learning with knowledge graph[J]. Front. Comput. Sci., 2023, 17(1): 171305-.
[2] Chuan SHI, Jiayu DING, Xiaohuan CAO, Linmei HU, Bin WU, Xiaoli LI. Entity set expansion in knowledge graph: a heterogeneous information network perspective[J]. Front. Comput. Sci., 2021, 15(1): 151307-.
[3] Ildar NURGALIEV, Qiang QU, Seyed Mojtaba Hosseini BAMAKAN, Muhammad MUZAMMAL. Matching user identities across social networks with limited profile data[J]. Front. Comput. Sci., 2020, 14(6): 146809-.
[4] Li ZENG, Lei ZOU. Redesign of the gStore system[J]. Front. Comput. Sci., 2018, 12(4): 623-641.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed