Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (4) : 623-641    https://doi.org/10.1007/s11704-018-7212-z
RESEARCH ARTICLE
Redesign of the gStore system
Li ZENG1, Lei ZOU1,2()
1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
2. Beijing Institute of Big Data Research, Beijing 100871, China
 Download: PDF(601 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including onetriple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.

Keywords graph database      subgraph matching      RDF management      SPARQL query     
Corresponding Author(s): Lei ZOU   
Just Accepted Date: 25 September 2017   Online First Date: 24 January 2018    Issue Date: 14 June 2018
 Cite this article:   
Li ZENG,Lei ZOU. Redesign of the gStore system[J]. Front. Comput. Sci., 2018, 12(4): 623-641.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7212-z
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I4/623
1 Bollacker K D, Cook R P, Tufts P. Freebase: a shared database of structured general human knowledge. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence. 2007, 1962–1963
2 Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. Dbpedia – A largescale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6(2): 167–195
3 Neumann T, Weikum G. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment, 2008, 1(1): 647–659
https://doi.org/10.14778/1453856.1453927
4 Neumann T, Weikum G. The RDF-3X engine for scalable management of RDF data. VLDB Journal, 2009, 19(1): 91–113
https://doi.org/10.1007/s00778-009-0165-y
5 Weiss C, Karras P, Bernstein A. Hexastore: sextuple indexing for semantic Web data management. Proceedings of the VLDB Endowment, 2008, 1(1): 1008–1019
https://doi.org/10.14778/1453856.1453965
6 Zou L, Mo J H, Chen L, Özsu M T, Zhao D Y. gStore: answering SPARQL queries via subgraph matching. Proceedings of the VLDB Endowment, 2011, 4(8): 482–493
https://doi.org/10.14778/2002974.2002976
7 Zeng K, Yang J C, Wang H X, Shao B, Wang Z Y. A distributed graph engine forWeb scale RDF data. Proceedings of the VLDB Endowment, 2013, 6(4): 265–276
https://doi.org/10.14778/2535570.2488333
8 Zou L, Özsu M T, Chen L, Shen X C, Huang R Z, Zhao D Y. gStore: a graph-based SPARQL query engine. VLDB Journal, 2014, 23(4): 565–590
https://doi.org/10.1007/s00778-013-0337-7
9 Aluç G. Workload matters: a robust approach to physical RDF database design. Dissertation for the Doctoral Degree. Waterloo: University of Waterloo, 2015
10 Ingalalli V, Ienco D, Poncelet P, Villata S. Querying RDF data using a multigraph-based approach. In: Proceedings of the 19th International Conference on Extending Database Technology. 2016, 245–256
11 Nabti C, Seba H. A simple algorithm for subgraph queries in big graphs. 2017, arXiv preprint arXiv:1703.05547
12 Erling O. Virtuoso, a hybrid rdbms/graph column store. IEEE Data Engineering Bulletin, 2012, 35(1): 3–8
13 Mcbride B. Jena: a semantic Web toolkit. IEEE Educational Activities Department, 2002
14 Guo Y B, Pan Z X, Heflin J. LUBM: a benchmark for OWL knowledge base systems. Web Semantics Science Services and Agents on the World Wide Web, 2005, 3(2): 158–182
https://doi.org/10.1016/j.websem.2005.06.005
15 Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42
https://doi.org/10.1145/321921.321925
16 Cordella L P, Foggia P, Sansone C, Vento M. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(10): 1367–1372
https://doi.org/10.1109/TPAMI.2004.75
17 Zhao P X, Han J W. On graph query optimization in large networks. VLDB Endowment, 2010, 3(3): 340–351
https://doi.org/10.14778/1920841.1920887
18 Zhu K, Zhang Y, Lin X M, Zhu G P, Wang W. Nova: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2010, 140–154
https://doi.org/10.1007/978-3-642-12026-8_13
19 Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Subgraph search over massive disk resident graphs. In: Proceedings of the International Conference on Scientific and Statistical Database Management. 2011, 312–321
https://doi.org/10.1007/978-3-642-22351-8_19
20 Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Answering subgraph queries over massive disk resident graphs. World Wide Web, 2016, 19(3): 417–448
https://doi.org/10.1007/s11280-014-0322-0
21 Lee J, Han W S, Kasperovics R, Lee J H. An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB Endowment, 2012, 6(2): 133–144
https://doi.org/10.14778/2535568.2448946
22 Han W S, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2013, 337–348
https://doi.org/10.1145/2463676.2465300
23 Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249
https://doi.org/10.14778/2809974.2809985
24 Ren X G, Wang J H. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment, 2015, 8(5): 617–628
https://doi.org/10.14778/2735479.2735493
25 McKay B D, Piperno A. Practical graph isomorphism, II. Journal of Symbolic Computation, 2014, 60(1): 94–112
https://doi.org/10.1016/j.jsc.2013.09.003
26 Shang H C, Zhang Y, Lin X M, Yu F X. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the VLDB Endowment, 2008, 1(1): 364–375
https://doi.org/10.14778/1453856.1453899
27 Bi F, Chang L J, Lin X M, Qin L, Zhang W J. Efficient subgraph matching by postponing Cartesian products. In: Proceedings of ACM International Conference on Management of Data. 2016, 1199–1214
https://doi.org/10.1145/2882903.2915236
28 Atre M, Chaoji V, Zaki M J, Hendler J A. Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the International Conference on World Wide Web. 2010, 41–50
https://doi.org/10.1145/1772690.1772696
29 Peng P, Zou L, Özsu M T, Chen L, Zhao D Y. Processing SPARQL queries over distributed RDF graphs. VLDB Journal, 2016, 25(2): 243–268
https://doi.org/10.1007/s00778-015-0415-0
[1] Ildar NURGALIEV, Qiang QU, Seyed Mojtaba Hosseini BAMAKAN, Muhammad MUZAMMAL. Matching user identities across social networks with limited profile data[J]. Front. Comput. Sci., 2020, 14(6): 146809-.
[2] Jihong YAN, Chengyu WANG, Wenliang CHENG, Ming GAO, Aoying ZHOU. A retrospective of knowledge graphs[J]. Front. Comput. Sci., 2018, 12(1): 55-74.
[3] Xiaoyan WANG,Tao YANG,Jinchuan CHEN,Long HE,Xiaoyong DU. RDF partitioning for scalable SPARQL query processing[J]. Front. Comput. Sci., 2015, 9(6): 919-933.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed