Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (1) : 146-162    https://doi.org/10.1007/s11704-016-5525-3
RESEARCH ARTICLE
Handling query skew in large indexes: a view based approach
Weihuang HUANG(), Jeffrey Xu YU, Zechao SHANG
Systems Engineering and EngineeringManagement, The Chinese University of Hong Kong, Hong Kong, China
 Download: PDF(1136 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Indexing is one of the most important techniques to facilitate query processing over a multi-dimensional dataset. A commonly used strategy for such indexing is to keep the tree-structured index balanced. This strategy reduces query processing cost in the worst case, and can handle all different queries equally well. In other words, this strategy implies that all queries are uniformly issued, which is partially because the query distribution is not possibly known and will change over time in practice. A key issue we study in this work is whether it is the best to fully rely on a balanced tree-structured index in particular when datasets become larger and larger in the big data era. This means that, when a dataset becomes very large, it becomes unreasonable to assume that all data in any subspace are equally important and are uniformly accessed by all queries at the index level. Given the existence of query skew and the possible changes of query skew, in this paper, we study how to handle such query skew and such query skew changes at the index level without sacrifice of supporting any possible queries in a wellbalanced tree index and without a high overhead. To tackle the issue, we propose index-view at the index level, where an index-view is a short-cut in a balanced tree-structured index to access objects in the subspaces that are more frequently accessed, and propose a new index-view-centric framework for query processing using index-views in a bottom-up manner. We study index-views selection problem in both static and dynamic setting, and we confirm the effectiveness of our approach using large real and synthetic datasets.

Keywords multi-dimensional index      query adaptive      indexview     
Corresponding Author(s): Weihuang HUANG   
Just Accepted Date: 18 July 2016   Online First Date: 22 September 2017    Issue Date: 12 January 2018
 Cite this article:   
Weihuang HUANG,Jeffrey Xu YU,Zechao SHANG. Handling query skew in large indexes: a view based approach[J]. Front. Comput. Sci., 2018, 12(1): 146-162.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5525-3
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I1/146
1 Guttman A. R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Special Interest Group on Management of Data. 1984, 47–57
https://doi.org/10.1145/602259.602266
2 Finkel R A, Bentley J L. Quad trees: a data structure for retrieval on composite keys. Acta Informatica, 1974, 4(1): 1–9
https://doi.org/10.1007/BF00288933
3 Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975, 18(9): 509–517
https://doi.org/10.1145/361002.361007
4 Samet H. Foundations of Multidimensional and Metric Data Structures. San Francisco, CA: Morgan Kaufmann, 2006
5 Silva-Filho Y V. Average case analysis of region search in balanced k-d trees. Information Processing Letters, 1979, 8(5): 219–223
https://doi.org/10.1016/0020-0190(79)90110-8
6 Silverstein C, Henzinger M R, Marais H, Moricz M.Analysis of a very large web search engine query log. SIGIR Forum, 1999, 33(1): 6–12
https://doi.org/10.1145/331403.331405
7 Gonzalez M C, Hidalgo C A, Barabasi A L. Understanding individual human mobility patterns. Nature, 2008, 453(7196): 779–782
https://doi.org/10.1038/nature06958
8 Yuan J, Zheng Y, Zhang C Y, Xie W L, Xie X, Sun G Z, Huang Y. Tdrive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2010, 99–108
9 Levandoski J J, Sarwat M, Eldawy A, Mokbel M F. Lars: a locationaware recommender system. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 450–461
https://doi.org/10.1109/ICDE.2012.54
10 Lee R, Wakamiya S, Sumiya K. Discovery of unusual regional social activities using geo-tagged microblogs. World WideWeb, 2011, 14(4): 321–349
https://doi.org/10.1007/s11280-011-0120-x
11 Arya S, Mount D M, Netanyahu N S, Silverman R, Wu A Y. An optimal algorithm for approximate nearest neighbor searching. In: Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. 1994, 573–582
12 Roy S B, Chakrabarti K. Location-aware type ahead search on spatial databases: semantics and efficiency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 361–372
13 Friedman J H, Bentley J L, Finkel R A. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 1977, 3(3): 209–226
https://doi.org/10.1145/355744.355745
14 Papadias D, Shen Q M, Tao Y F, Mouratidis K. Group nearest neighbor queries. In: Proceedings of the 20th IEEE International Conference on Data Engineering. 2004, 301–312
https://doi.org/10.1109/ICDE.2004.1320006
15 Felipe I D, Hristidis V, Rishe N. Keyword search on spatial databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 656–665
https://doi.org/10.1109/ICDE.2008.4497474
16 Cong G, Jensen C S, Wu D M. Efficient retrieval of the top-k most relevant spatial Web objects. The Proceedings of the VLDB Endowment, 2009, 2(1): 337–348
https://doi.org/10.14778/1687627.1687666
17 Cao X, Cong G, Jensen C S, Ooi B C. Collective spatial keyword querying. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2011, 373–384
https://doi.org/10.1145/1989323.1989363
18 Li G L, Feng J H, Xu J. Desks: direction-aware spatial keyword search. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 474–485
https://doi.org/10.1109/ICDE.2012.93
19 Sheng C, Tao Y F. FIFO indexes for decomposable problems. In: Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2011, 25–35
https://doi.org/10.1145/1989284.1989291
20 Hjaltason G R, Samet H. Distance browsing in spatial databases. ACM Transactions on Database Systems, 1999, 24(2): 265–318
https://doi.org/10.1145/320248.320255
21 Nemhauser G L, Wolsey L A, Fisher M L. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 1978, 14(1): 265–294
https://doi.org/10.1007/BF01588971
22 Feige U. A threshold of ln n for approximating set cover. Journal of the ACM, 1998, 45(4): 634–652
https://doi.org/10.1145/285055.285059
23 Sviridenko M. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 2004, 32(1): 41–43
https://doi.org/10.1016/S0167-6377(03)00062-2
24 Berinde R, Cormode G, Indyk P, Strauss M J. Space-optimal heavy hitters with strong error bounds. In: Proceedings of ACM SIGMODSIGACT- SIGART Symposium on Principles of Database Systems. 2009, 157–166
https://doi.org/10.1145/1559795.1559819
25 Metwally A, Agrawal D, El Abbadi A. Efficient computation of frequent and top-k elements in data streams. In: Proceedings of International Conference on Database Theory. 2005, 398–412
26 Cudré-Mauroux P, Wu E, Madden S. Trajstore: an adaptive storage system for very large trajectory data sets. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 109–120
https://doi.org/10.1109/ICDE.2010.5447829
27 Achakeev D, Seeger B, Widmayer P. Sort-based query-adaptive loading of R-trees. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 2080–2084
https://doi.org/10.1145/2396761.2398577
28 Sleator D D, Tarjan R E. Self-adjusting binary search trees. Journal of the ACM, 1985, 32(3): 652–686
https://doi.org/10.1145/3828.3835
29 Park E, Mount D M. A self-adjusting data structure for multidimensional point sets. In: Proceedings of European Symposium on Algorithms. 2012, 778–789
https://doi.org/10.1007/978-3-642-33090-2_67
30 Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of Innovative Data Systems Research. 2007, 68–78
31 Tzoumas K, Yiu M L, Jensen C S. Workload-aware indexing of continuously moving objects. Proceedings of the VLDB Endowment, 2009, 2(1): 1186–1197
https://doi.org/10.14778/1687627.1687761
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed