Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (3) : 143601    https://doi.org/10.1007/s11704-019-7235-0
RESEARCH ARTICLE
Most similar maximal clique query on large graphs
Yun PENG1(), Yitong XU2, Huawei ZHAO1, Zhizheng ZHOU1, Huimin HAN3
1. Research Center of Big Data Application, Qilu University of Technology, Jinan 250353, China
2. Data Science Institute, Columbia University, New York 10027, USA
3. College of Mechanics and Materials, Hohai University, Nanjing 210098, China
 Download: PDF(1249 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

This paper studies the most similar maximal clique query (MSMCQ). Given a graphG and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q. MSMCQ has many real applications including advertising industry, public security, task crowdsourcing and social network, etc. MSMCQ can be studied as a special case of the general set similarity query (SSQ). However, the MCs of G has several specialties from the general sets. Based on the specialties of MCs, we propose a novel index, namely MCIndex. MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time. Specifically, we first construct an inverted index I for all the MCs of G. Since the MCs in a posting list often have a lot of overlaps, MCIndex selects some pivots to cluster the MCs with a small radius. Given a query Q, we compute the distance from the pivots to Q. The clusters of the pivots assured not answer can be pruned by our distance based pruning rule. Since it is NP-hard to construct a minimum MCIndex, we propose to construct a minimal MCIndex on I(v) with an approximation ratio 1+ ln |I(v)|. Since the MCs have properties that are inherent of graph structure, we further propose a SIndex within each cluster of a MCIndex and a structure based pruning rule. SIndex can significantly reduce the number of candidates. Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation, we also propose a binary representation of MCs to improve the efficiency of the intersection size computation. Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.

Keywords most similar maximal clique      similarity query      graph data     
Corresponding Author(s): Yun PENG   
Issue Date: 10 January 2020
 Cite this article:   
Yun PENG,Yitong XU,Huawei ZHAO, et al. Most similar maximal clique query on large graphs[J]. Front. Comput. Sci., 2020, 14(3): 143601.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-7235-0
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I3/143601
1 M Hamann, E Röhrs, D Wagner. Local community detection based on small cliques. Algorithms, 2017, 10(3): 1–22
2 W Cui, Y Xiao, H Wang, W Wang. Local search of communities in large graphs. In: Proceedings of the 2014 ACMSIGMOD International Conference on Management of Data. 2014, 991–1002
3 T Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 2010, 56(1): 3–16
4 Y Wu, R Jin, J Li, X Zhang. Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment, 2015, 8(7): 798–809
5 M Wang, C Wang, J X Yu, J Zhang. Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 2015, 8(10): 998–1009
6 H Cai, V W Zheng, F Zhu, K C C Chang, Z Huang. From community detection to community profiling. Proceedings of the VLDB Endowment, 2017, 10(7): 817–828
7 D Palsetia, M M A Patwary, W Hendrix, A Agrawal, A Choudhary. Clique guided community detection. In: Proceedings of the 2014 IEEE International Conference on Big Data. 2014, 500–509
8 V Boginski, S Butenko, P M Pardalos. Mining market data: a network approach. Computers & Operations Research, 2006, 33(11): 3171–3184
9 N Berry, T Ko, T Moy, J Smrcka, J Turnley, B Wu. Emergent clique formation in terrorist recruitment. In: Proceedings of the Workshop on Agent Organizations: Theory and Practice of AAAI ’04. 2004, 1–8
10 F Kose, W Weckwerth, T Linke, O Fiehn. Visualizing plant metabolomic correlation network using clique-metabolite matrices. Bioinformatics, 2001, 17: 1198–1208
11 J Cheng, L Zhu, Y Ke, S Chu. Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1240–1248
12 K Makino, T Uno. New algorithms for enumerating all maximal cliques. In: Proceedings of Scandinavian Workshop on Algorithm Theory. 2004, 260–272
13 P R Östergård. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 2002, 120(1–3): 197–207
14 X Liang, R Lu, X Lin, X Shen. Security and Privacy in Mobile Social Networks. Springer-Verlag New York, 2013
15 H Sarvari, E Abozinadah, A Mbaziira, D Mccoy. Constructing and analyzing criminal networks. In: Proceedings of the 2014 IEEE Security and Privacy Workshops. 2014, 84–91
16 D Schall. Service-Oriented Crowdsourcing. Springer-Verlag New York, 2012
17 K Bacon, P Dewan. Mixed-initiative friend-list creation. In: Proceedings of the 12th European Conference on Computer Supported Cooperative Work. 2011, 293–312
18 W Cui, Y Xiao, H Wang, Y Lu, W Wang. Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 277–288
19 T Matsunaga, C Yonemori, E Tomita, M Muramatsu. Clique-based data mining for related genes in a biomedical database. BMC Bioinformatics, 2009, 10(1): 205
20 S Sarawagi, A Kirpal. Efficient set joins on similarity predicates. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 743–754
21 M Hadjieleftheriou, A Chandel, N Koudas, D Srivastava. Fast indexes and algorithms for set similarity selection queries. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 267–276
22 M Hadjieleftheriou, D Srivastava. Weighted set-based string similarity. IEEE Data Engineering Bulletin, 2010, 33(1): 25–36
23 J S Culpepper, A Moffat. Efficient set intersection for inverted indexing. ACM Transactions on Information System, 2010, 29(1): 1–24
24 H Wu, G Li, L Zhou. Ginix: generalized inverted index for keyword search. Tsinghua Science and Technology, 2013, 18(1): 77–87
25 D Deng, G Li, H Wen, J Feng. An efficient partition based method for exact set similarity joins. Proceedings of the VLDB Endowment, 2015, 9(4): 360–371
26 L Yuan, L Qin, X Lin, L Chang, W Zhang. Diversified top-k clique search. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 387–398
27 J Wang, J Cheng, A W C Fu. Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 122–130
28 C Li, J Lu, Y Lu. Efficient merging and filtering algorithms for approximate string searches. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 257–266
29 R J Bayardo, Y Ma, R Srikant. Scaling up all pairs similarity search. In: Proceedings of the 16th International WorldWideWeb Conference. 2007, 131–140
30 C Xiao, W Wang, X Lin, J X Yu. Efficient similarity joins for near duplicate detection. In: Proceedings of the 17th International WorldWide Web Conference. 2008, 131–140
31 J Wang, G Li, J Feng. Can we beat the prefix filtering?: An adaptive framework for similarity join and search. In: Proceedings of the 2012 ACM International Conference on Management of Data. 2012, 85–96
32 C Xiao, W Wang, X Lin, H Shang. Top-k set similarity joins. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 916–927
33 D Deng, G Li, J Feng. A pivotal prefix based filtering algorithm for string similarity search. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 673–684
34 N Ao, F Zhang, D Wu, D S Stones, G Wang, X Liu, J Liu, S Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment, 2011, 4(8): 470–481
35 H Inoue, M Ohara, K Taura. Faster set intersection with simd instructions by reducing branch mispredictions. Proceedings of the VLDB Endowment, 2014, 8(3): 293–304
36 R Vernica, M J Carey, C Li. Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 495–506
37 A C K Bolin Ding. Fast set intersection in memory. In: Proceedings of the 37th International Conference on Very Large Databases. 2011, 255–266
38 Z Fan, Y Peng, B Choi, J Xu, S S Bhowmick. Towards efficient authenticated subgraph query service in outsourced graph databases. IEEE Transactions on Services Computing, 2014, 7(4): 696–713
39 V Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979, 4(3): 233–235
40 D Eppstein, M Löffler, D Strash. Listing all maximal cliques in sparse graphs in near-optimal time. Algorithms and Computation, 2010, 403–414
[1] Article highlights Download
[1] Zhigang ZHANG, Xiaodong QI, Yilin WANG, Cheqing JIN, Jiali MAO, Aoying ZHOU. Distributed top-k similarity query on big trajectory streams[J]. Front. Comput. Sci., 2019, 13(3): 647-664.
[2] Li ZENG, Lei ZOU. Redesign of the gStore system[J]. Front. Comput. Sci., 2018, 12(4): 623-641.
[3] Jihong YAN, Chengyu WANG, Wenliang CHENG, Ming GAO, Aoying ZHOU. A retrospective of knowledge graphs[J]. Front. Comput. Sci., 2018, 12(1): 55-74.
[4] Lianyin JIA, Jianqing XI, Mengjuan LI, Yong LIU, Decheng MIAO. ETI: an efficient index for set similarity queries[J]. Front Comput Sci, 2012, 6(6): 700-712.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed