Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (1) : 113-123    https://doi.org/10.1007/s11704-015-4570-7
RESEARCH ARTICLE
Accuracy estimation of link-based similarity measures and its application
Yinglong ZHANG1,3,Cuiping LI2,*(),Chengwang XIE1,3,Hong CHEN2
1. School of Software, East China Jiaotong University, Nanchang 330045, China
2. Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,and Department of Computer Science, Renmin University of China, Beijing 100872, China
3. Intelligent Optimization and Information Processing Laboratory, East China Jiaotong University,Nanchang 330013, China
 Download: PDF(806 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Link-based similarity measures play a significant role in many graph based applications. Consequently, measuring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim-Rank (SR) have emerged as the most popular and influential link-based similarity measures. Recently, a novel linkbased similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calculation.The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guarantee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR,SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similarity scores becomes. Furthermore, we demonstrate the effectiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query.We also run a comprehensive set of experiments on real world data sets to verify the effectiveness and efficiency of our upper bounds.

Keywords personalized PageRank      SimRank      P-Rank      upper bound     
Corresponding Author(s): Cuiping LI   
Just Accepted Date: 16 March 2015   Issue Date: 06 January 2016
 Cite this article:   
Yinglong ZHANG,Cuiping LI,Chengwang XIE, et al. Accuracy estimation of link-based similarity measures and its application[J]. Front. Comput. Sci., 2016, 10(1): 113-123.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-4570-7
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I1/113
1 Gupta P, Goel A, Lin J, Sharma A,Wang D, Zadeh R.WTF: the who to follow service at Twitter. In: Proceedings of International World Wide Web Conference. 2013, 505–514
2 Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology,2007, 58(7): 1019–1031
https://doi.org/10.1002/asi.20591
3 Joshi A, Kumar R, Reed B, Tomkins A. Anchor-based proximity measures.In: Proceedings of International World Wide Web Conference.2007, 1131–1132
https://doi.org/10.1145/1242572.1242730
4 Antonellis I, Molina H G, Chang C C. SimRank++: query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment,2008, 1(1): 408–421
https://doi.org/10.14778/1453856.1453903
5 Jeh G, Widom J. Scaling personalized web search. In: Proceedings of International World Wide Web Conference. 2003, 271–279
https://doi.org/10.1145/775152.775191
6 Jeh G, Widom J. SimRank: a measure of structural-context similarity.In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2002, 538–543
https://doi.org/10.1145/775047.775126
7 Sarkar P,Moore A W, Prakash A. Fast incremental proximity search in large graphs. In: Proceedings of International Conference on Machine Learning. 2008, 896–903
https://doi.org/10.1145/1390156.1390269
8 Sarkar P,Moore AW. A tractable approach to finding closest truncatedcommute-time neighbors in large graphs. In: Proceedings of Uncertainty in Artificial Intelligence. 2007, 335–343
9 Zhao P, Han J, Sun Y. P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 553–562
https://doi.org/10.1145/1645953.1646025
10 Lizorkin D, Velikhov P, Grinev M N, Turdakov D. Accuracy estimate and optimization techniques for SimRank computation. Proceedings of the VLDB Endowment, 2008, 1(1): 422–433
https://doi.org/10.14778/1453856.1453904
11 Sun L, Cheng R, Li X, Cheung D W, Han J. On link-based similarity join. The Proceedings of the VLDB Endowment, 2011, 4(11): 714–725
12 Zhang Y, Li C, Xie C, Chen H. Accuracy estimation of link-based similarity measures and its application. In: Proceedings of Web-Age Information Management WAIM. 2014, 100–112
https://doi.org/10.1007/978-3-319-08010-9_13
13 Zhang Y, Li C, Chen H, Sheng L. Fast SimRank computation over disk-resident graphs. In: Proceedings of International Conference of Database Systems for Advanced Applications. 2013, 16–30
https://doi.org/10.1007/978-3-642-37450-0_2
14 Lizorkin D, Velikhov P, Grinev M N, Turdakov D. Accuracy estimate and optimization techniques for SimRank computation. The International Journal on Very Large Data Bases, 2010, 19(1): 45–66
https://doi.org/10.1007/s00778-009-0168-8
15 Zhu F, Fang Y, Chang K C C, Ying J. Incremental and accuracy-aware personalized PageRank through scheduled approximation. The Proceedings of the VLDB Endowment, 2013, 6(6): 481–492
https://doi.org/10.14778/2536336.2536348
16 Lee P, Lakshmanan L V S, Yu J X. On top-k structural similarity search. In: Proceedings of International Conference on Data Engineering.2012, 774–785
https://doi.org/10.1109/icde.2012.109
17 Yu W, Lin X, Zhang W. Towards efficient simrank computation on large networks. In: Proceedings of International Conference on Data Engineering.2013, 601–612
18 Li X, Yu W, Yang B, Le J. ASAP: Towards accurate, stable and accelerative penetrating-rank estimation on large graphs. In: Proceedings of Web-Age Information Management. 2011, 415–429
19 Yu W, Le J, Lin X, Zhang W. On the efficiency of estimating penetrating rank on large graphs. In: Proceedings of Scientific and Statistical Database Management. 2012, 231–249
https://doi.org/10.1007/978-3-642-31235-9_15
20 Fujiwara Y, Nakatsuji M, Shiokawa H, Mishima T, Onizuka M. Efficient ad-hoc search for personalized pagerank. In: Proceedings of the ACM SIGMOD International Conference on Management of Data.2013, 445–456
https://doi.org/10.1145/2463676.2463717
21 Albert R, Barabasi A. Statistical mechanics of complex networks. Reviews of Modern Physics, 2002, 74: 47–97
https://doi.org/10.1103/RevModPhys.74.47
22 Jin R, Ruan N, Xiang Y, Wang H. Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Transaction Database System, 2011, 36(1): 1–44
https://doi.org/10.1145/1929934.1929941
23 Zheng W, Zou L, Feng Y, Chen L, Zhao D. Efficient simrank-based similarity join over large graphs. Proceedings of the VLDB Endowment,2013, 6(7): 493–504
https://doi.org/10.14778/2536349.2536350
[1] Supplementary Material-Highlights in 3-page ppt Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed