|
|
Continuous ranking on uncertain streams |
Cheqing JIN( ), Jingwei ZHANG, Aoying ZHOU |
Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, Shanghai 200062, China |
|
|
Abstract Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and timecomplexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.
|
Keywords
possible world semantics
uncertain data stream
continuous ranking query
sampling
|
Corresponding Author(s):
JIN Cheqing,Email:cqjin@sei.ecnu.edu.cn
|
Issue Date: 01 December 2012
|
|
1 |
Aggarwal C C. Managing and Mining Uncertain Data. Springer , 2009 doi: 10.1007/978-0-387-09690-2
|
2 |
Antova L, Koch C, Olteanu D. From complete to incomplete information and back. In: Proceedings of ACM SIGMOD . 2007, 713-724
|
3 |
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. The VLDB Journal , 2007, 16(4): 523-544 doi: 10.1007/s00778-006-0004-3
|
4 |
Agrawal P, Benjelloun O, Sarma A, Hayworth C, Nabar S, Sugihara T, Widom J. Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB . 2006, 1151-1154
|
5 |
Soliman M, Ilyas I, Chen-Chuan Chang K. Top-k query processing in uncertain databases. In: ICDE . 2007, 896-905
|
6 |
Benjelloun O, Sarma A, Halevy A, Widom J. ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB . 2006, 953-964
|
7 |
Jiang L X. Learning random forests for ranking. Frontiers of Computer Science in China . 2011, 5(1): 79-86 doi: 10.1007/s11704-010-0388-5
|
8 |
Geng X B, Cheng X Q. Learning multiple metrics for ranking. Frontiers of Computer Science in China . 2011, 5(3): 259-267 doi: 10.1007/s11704-011-0152-5
|
9 |
Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD . 2008, 673-686
|
10 |
Zhang X, Chomicki J. On the semantics and evaluation of top-k queries in probabilistic databases. In: Proceedings of DBRank . 2008, 556-563
|
11 |
Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE . 2009, 305-316
|
12 |
Ge T, Zdonik S, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD . 2009, 375-388
|
13 |
Yan D, Ng W. Robust ranking of uncertain data. In: Proceedings of DASFAA . 2011, 254-268
|
14 |
Jin C, Yi K, Chen L, Yu J, Lin X. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment , 2008, 1(1): 301-312
|
15 |
Jin C, Gao M, Zhou A. Handling ER-topk query on uncertain streams. In: Proceedings of DASFAA . 2011, 326-340
|
16 |
Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press , 1995, 67-73
|
17 |
Dalvi N, Suciu D. Management of probabilistic data: foundations and challenges. In: Proceedings of PODS . 2007, 1-12
|
18 |
Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of SODA . 2007, 346-355
|
19 |
Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of ACM SIGMOD . 2007, 281-292
|
20 |
Jin C, Zhou M, Zhou A. Computing rarity on uncertain data. SCIENCE CHINA Information Sciences , 2011, 54(10): 2028-2039 doi: 10.1007/s11432-011-4378-5
|
21 |
Aggarwal C, Yu P. A framework for clustering uncertain data streams. In: Proceedings of ICDE . 2008, 150-159
|
22 |
Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of ACM SIGMOD . 2008, 819-832
|
23 |
Zhang W, Lin X, Zhang Y, Wang W, Yu J. Probabilistic skyline operator over sliding windows. In: Proceedings of ICDE . 2009, 1060-1071
|
24 |
Tran T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of SIGMOD . 2010, 159-170
|
25 |
Tran T, McGregor A, Diao Y, Peng L, Liu A. Conditioning and aggregating uncertain data streams: going beyond expectations. Proceedings of the VLDB Endowment , 2010, 3(1-2): 1302-1313
|
26 |
Soliman M, Ilyas I. Ranking with uncertain scores. In: Proceedings of ICDE . 2009, 317-328
|
27 |
Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment , 2009, 2(1): 502-513
|
28 |
Hua M, Pei J. Continuously monitoring top-k uncertain data streams: a probabilistic threshold method. Distributed and Parallel Databases , 2009, 26(1): 29-65 doi: 10.1007/s10619-009-7043-x
|
29 |
Tang M, Li F, Phillips J M, Jestes J. Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of ICDE . 2012
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|