Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    0, Vol. Issue () : 387-398    https://doi.org/10.1007/s11704-015-4515-1
Big graph search: challenges and techniques
Shuai MA,Jia LI,Chunming HU(),Xuelian LIN,Jinpeng HUAI
State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
 Download: PDF(484 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

On one hand, compared with traditional relational and XML models, graphs have more expressive power and are widely used today. On the other hand, various applications of social computing trigger the pressing need of a new search paradigm. In this article, we argue that big graph search is the one filling this gap. We first introduce the application of graph search in various scenarios. We then formalize the graph search problem, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both the industry and academia. After that, we analyze the difficulties and challenges of big graph search. Finally, we present three classes of techniques towards big graph search: query techniques, data techniques and distributed computing techniques.

Keywords graph search      big data      query techniques      data techniques      distributed computing     
Corresponding Author(s): Chunming HU   
Just Accepted Date: 22 April 2015   Online First Date: 06 April 2016    Issue Date: 16 May 2016
 Cite this article:   
Shuai MA,Jia LI,Chunming HU, et al. Big graph search: challenges and techniques[J]. Front. Comput. Sci., 0, (): 387-398.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-4515-1
https://academic.hep.com.cn/fcs/EN/Y0/V/I/387
1 Cukier K. Data, data everywhere: a special report on managing information. Economist Newspaper, 2010
2 Ma S, Li J, Liu X, Huai J. Graph search: a new searching approach to the social computing era. Communications of CCF, 2012, 8(11): 26–31
3 Ma S, Cao Y, Wo T, Huai J. Social networks and graph matching. Communications of CCF, 2012, 8(4): 20–24
4 Ma S, Li J, Liu X, Huai J. Graph search in the big data era. Information and Communications Technologies, 2013, 6: 44–51
5 Tian Y, Patel J M. Tale: A tool for approximate large graph matching. In: Proceedings of IEEE the 24th International Conference on Data Engineering. 2008, 963–972
6 Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1): 264–275
7 Barcelo P, Hurtado C A, Libkin L, Wood P T. Expressive languages for path queries over graph-structured data. In: Proceedings of the 29th ACM Symposium on Principles of Database Systems. 2010, 3–14
8 Feng K, Cong G, Bhowmick S S, Ma S. In search of influential event organizers in online social networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 63–74
9 Maserrat H, Pei J. Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 533–542
10 Schenker A, Last M, Bunke H, Kandel A. Classification of web documents using graph matching. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18(3): 475–496
11 Fan W, Li J, Ma S, Wang H, Wu Y. Graph homomorphism revisited for graph matching. Proceedings of the VLDB Endowment, 2010, 3(1): 1161–1172
12 Terveen L G, McDonald D W. Social matching: a framework and research agenda. ACM Transactions on Computer-Human Interaction, 2005, 12(3): 401–434
13 Ma S, Cao Y, Fan W, Huai J, Wo T. Capturing topology in graph pattern matching. Proceedings of the VLDB Endowment, 2011, 5(4): 310–321
14 Ma S, Cao Y, Fan W, Huai J, Wo T. Strong simulation: capturing topology in graph pattern matching. ACM Transactions on Database Systems, 2014, 39(1)
15 Eckerson W. Data quality and the bottom line: achieving business success through a commitment to high quality data. TDWI Report. 2002
16 Otto B, Weber K. From health checks to the seven sisters: the data quality journey at bt. Report: BT TR-BE HSG/CC CDQ/8. 2009
17 Fan W, Li J, Ma S, Tang N, Yu W. Interaction between record matching and data repairing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 469–480
18 Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42
19 Liu C, Chen C, Han J, Yu P S. Gplag: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 872–881
20 Ferrante J, Ottenstein K J, Warren J D. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 1987, 9(3): 319–349
21 Rice M N, Tsotras V J. Graph indexing of road networks for shortest path queries with label restrictions. Proceedings of the VLDB Endowment, 2010, 4(2): 69–80
22 Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms. Cambridge: The MIT Press, 2001
23 Chen Z, Shen H T, Zhou X, Yu J X. Monitoring path nearest neighbor in road networks. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 591–602
24 Chowdhury N M M K, Rahman M R, Boutaba R. Virtual network embedding with coordinated node and link mapping. In: Proceedings of IEEE 28th Conference on Computer Communications. 2009, 783–791
25 Conte D, Foggia P, Sansone C, Vento M. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial, 2004, 18(3): 265–298
26 Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration Systems, 1999, 7(1): 69–79
27 Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In: Proceedings of IEEE the 27th Conference on Data Engineering. 2011, 39–50
28 Hansen P B, ed. Classic Operating Systems. New York: Springer, 2001
29 Ramakrishnan R, Gehrke J. Database Management Systems. New York: McGraw-Hill Higher Education, 2000
30 Abiteboul S, Hull R, Vianu V. Foundations of Databases. Addison-Wesley, 1995
31 Sakr S, Pardede E, eds. Graph Data Management: Techniques and Applications. IGI Global, 2011
32 Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135–146
33 Yang S, Wu Y, Sun H, Yan X. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment, 2014, 7(7): 565–576
34 Beitzel S M, Jensen E C, Frieder O, Lewis D D, Chowdhury A, Kolcz A. Improving automatic query classification via semi-supervised learning. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 42–49
35 Shen D, Sun J T, Yang Q, Chen Z. Building bridges for web query classification. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 131–138
36 Xing Q, Liu Y, Nie J Y, Zhang M, Ma S, Zhang K. Incorporating user preferences into click models. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1301–1310
37 Hu B, Zhang Y, Chen W, Wang G, Yang Q. Characterizing search intent diversity into click models. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 17–26
38 Maria G, Symeon P, Athena V. Massive graph management for the Web and Web 2.0. New Directions in Web Data Management 1. Springer, 2011, 19–58
39 Newman M, Barabási A L, Watts D J.The Structure and Dynamics of Networks. Princeton: Princeton University Press, 2006
40 Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin, 2000, 23(4): 3–13
41 Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. The International Journal on Very Large Data Bases, 2012, 21(2): 213–238
42 Henzinger M R, Henzinger T A, Kopke P W. Computing simulations on finite and infinite graphs. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science. 1995, 453–462
43 Ramalingam G, Reps T W. A categorized bibliography on incremental computation. In: Proceedings of the 20th Symposium on Principles of Programming Languages. 1993, 502–510
44 Ramalingam G, Reps T W. On the computational complexity of dynamic graph problems. Theoretical Computer Science, 1996, 158(1): 233–277
45 Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Conference on Operating System Design and Implementation. 2004, 137–149
46 Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–15
47 Papadimitriou C H. Computational Complexity. Addison-Wesley, 1994
48 Yu W, Aggarwal C C, Ma S, Wang H. On anomalous hotspot discovery in graph streams. In: Proceedings of the 13th IEEE International Conference on Data Mining. 2013, 1271–1276
49 Aggarwal C C, Wang H. Managing and Mining Graph Data. New York: Springer, 2010
50 Jordan M I. Divide-and-conquer and statistical inference for big data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 4–4
51 Kleiner A, Talwalkar A, Sarkar P, Jordan M I. The big data bootstrap. In: Proceedings of the 29th International Conference onMachine Learning. 2012, 1759–1766
52 Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 1970, 49(2): 291–307
53 Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 1998, 20(1): 359–392
54 Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012, 517–528
55 Salomon D. Data compression: The Complete Reference. 4th ed.New York: Springer, 2007
56 Buehrer G, Chellapilla K. A scalable pattern mining approach to Web graph compression with communities. In: Proceedings of the 2008 International Conference onWeb Search and Data Mining. 2008, 95–106
57 Adler M, Mitzenmacher M. Towards compressingWeb graphs. In: Proceedings of Data Compression Conference. 2001, 203–212
58 Boldi P, Vigna S. The WebGraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 595–602
59 Feder T, Motwani R. Clique partitions, graph compression and speeding-up algorithms. Journal of Computer and System Sciences, 1995, 51(2): 261–272
60 Karande C, Chellapilla K, Andersen R. Speeding up algorithms on compressed Web graphs. In: Proceedings of the 2009 International Conference on Web Search and Data Mining. 2009, 272–281
61 Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012, 157–168
62 Baeza-Yates R A, Ribeiro-Neto B A. Modern Information Retrieval: the concepts and technology behind search. 2nd ed. Harlow: Pearson Education Ltd., 2011
63 Klein K, Kriege N, Mutzel P. CT-Index: Fingerprint-based graph indexing combining cycles and trees. In: Proceedings of IEEE the 27th International Conference on Data Engineering. 2011, 1115–1126
64 Lynch N A. Distributed Algorithms. San Francisco: Morgan Kaufmann, 1996
65 Peleg D. Distributed Computing: A Locality-Sensitive Approach. SIAM, 2000
66 Ma S, Cao Y, Huai J, Wo T. Distributed graph pattern matching. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 949–958
67 Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 15–28
68 Gao J, Zhou J, Zhou C, Yu J X. Glog: A high level graph analysis system using mapreduce. In: Proceedings of IEEE the 30th International Conference on Data Engineering. 2014, 544–555
69 Qin L, Yu J X, Chang L, Cheng H, Zhang C, Lin X. Scalable big graph processing in mapreduce. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 827–838
70 Xin R S, Gonzalez J E, Franklin M J, Stoica I. Graphx: a resilient distributed graph system on spark. In: Proceeding of the 1st International Workshop on Graph DataManagement Experiences and Systems. 2013
71 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Distributed graphlab: a framework for machine learning in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
72 Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17–30
73 Fan W, Huai J. Querying big data: bridging theory and practice. Journal of Computer Science and Technology, 2014, 29(5): 849–869
[1] Yulin HE, Jiaqi CHEN, Jiaxing SHEN, Philippe FOURNIER-VIGER, Joshua Zhexue HUANG. Density estimation-based method to determine sample size for random sample partition of big data[J]. Front. Comput. Sci., 2024, 18(5): 185322-.
[2] Ashish SINGH, Abhinav KUMAR, Suyel NAMASUDRA. DNACDS: Cloud IoE big data security and accessing scheme based on DNA cryptography[J]. Front. Comput. Sci., 2024, 18(1): 181801-.
[3] Muazzam MAQSOOD, Sadaf YASMIN, Saira GILLANI, Maryam BUKHARI, Seungmin RHO, Sang-Soo YEO. An efficient deep learning-assisted person re-identification solution for intelligent video surveillance in smart cities[J]. Front. Comput. Sci., 2023, 17(4): 174329-.
[4] Linlin DING, Shu WANG, Baoyan SONG. Efficient k-dominant skyline query over incomplete data using MapReduce[J]. Front. Comput. Sci., 2021, 15(4): 154611-.
[5] Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[6] Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing[J]. Front. Comput. Sci., 2020, 14(4): 144607-.
[7] Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.
[8] Xingyue CHEN, Tao SHANG, Feng ZHANG, Jianwei LIU, Zhenyu GUAN. Dynamic data auditing scheme for big data storage[J]. Front. Comput. Sci., 2020, 14(1): 219-229.
[9] Wenjie LIU, Zhanhuai LI. An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment[J]. Front. Comput. Sci., 2019, 13(6): 1309-1325.
[10] Peng PENG, Lei ZOU, Zhenqin DU, Dongyan ZHAO. Using partial evaluation in holistic subgraph search[J]. Front. Comput. Sci., 2018, 12(5): 966-983.
[11] Yu ZHOU, Nvqi ZHOU, Tingting HAN, Jiayi GU, Weigang WU. Probabilistic verification of hierarchical leader election protocol in dynamic systems[J]. Front. Comput. Sci., 2018, 12(4): 763-776.
[12] Min NIE, Lei YANG, Jun SUN, Han SU, Hu XIA, Defu LIAN, Kai YAN. Advanced forecasting of career choices for college students based on campus big data[J]. Front. Comput. Sci., 2018, 12(3): 494-503.
[13] Xuegang HU, Peng ZHOU, Peipei LI, Jing WANG, Xindong WU. A survey on online feature selection with streaming features[J]. Front. Comput. Sci., 2018, 12(3): 479-493.
[14] Wuyang JU,Jianxin LI,Weiren YU,Richong ZHANG. iGraph: an incremental data processing system for dynamic graph[J]. Front. Comput. Sci., 2016, 10(3): 462-476.
[15] Qinma KANG, Hong HE. Task assignment for minimizing application completion time using honeybee mating optimization[J]. Front Comput Sci, 2013, 7(3): 404-415.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed