Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    2012, Vol. 6 Issue (6) : 686-699    https://doi.org/10.1007/s11704-012-1227-7
RESEARCH ARTICLE
Continuous ranking on uncertain streams
Cheqing JIN(), Jingwei ZHANG, Aoying ZHOU
Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, Shanghai 200062, China
 Download: PDF(623 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and timecomplexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.

Keywords possible world semantics      uncertain data stream      continuous ranking query      sampling     
Corresponding Author(s): JIN Cheqing,Email:cqjin@sei.ecnu.edu.cn   
Issue Date: 01 December 2012
 Cite this article:   
Cheqing JIN,Aoying ZHOU,Jingwei ZHANG. Continuous ranking on uncertain streams[J]. Front Comput Sci, 2012, 6(6): 686-699.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-012-1227-7
https://academic.hep.com.cn/fcs/EN/Y2012/V6/I6/686
1 Aggarwal C C. Managing and Mining Uncertain Data. Springer , 2009
doi: 10.1007/978-0-387-09690-2
2 Antova L, Koch C, Olteanu D. From complete to incomplete information and back. In: Proceedings of ACM SIGMOD . 2007, 713-724
3 Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. The VLDB Journal , 2007, 16(4): 523-544
doi: 10.1007/s00778-006-0004-3
4 Agrawal P, Benjelloun O, Sarma A, Hayworth C, Nabar S, Sugihara T, Widom J. Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB . 2006, 1151-1154
5 Soliman M, Ilyas I, Chen-Chuan Chang K. Top-k query processing in uncertain databases. In: ICDE . 2007, 896-905
6 Benjelloun O, Sarma A, Halevy A, Widom J. ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB . 2006, 953-964
7 Jiang L X. Learning random forests for ranking. Frontiers of Computer Science in China . 2011, 5(1): 79-86
doi: 10.1007/s11704-010-0388-5
8 Geng X B, Cheng X Q. Learning multiple metrics for ranking. Frontiers of Computer Science in China . 2011, 5(3): 259-267
doi: 10.1007/s11704-011-0152-5
9 Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD . 2008, 673-686
10 Zhang X, Chomicki J. On the semantics and evaluation of top-k queries in probabilistic databases. In: Proceedings of DBRank . 2008, 556-563
11 Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE . 2009, 305-316
12 Ge T, Zdonik S, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD . 2009, 375-388
13 Yan D, Ng W. Robust ranking of uncertain data. In: Proceedings of DASFAA . 2011, 254-268
14 Jin C, Yi K, Chen L, Yu J, Lin X. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment , 2008, 1(1): 301-312
15 Jin C, Gao M, Zhou A. Handling ER-topk query on uncertain streams. In: Proceedings of DASFAA . 2011, 326-340
16 Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press , 1995, 67-73
17 Dalvi N, Suciu D. Management of probabilistic data: foundations and challenges. In: Proceedings of PODS . 2007, 1-12
18 Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of SODA . 2007, 346-355
19 Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of ACM SIGMOD . 2007, 281-292
20 Jin C, Zhou M, Zhou A. Computing rarity on uncertain data. SCIENCE CHINA Information Sciences , 2011, 54(10): 2028-2039
doi: 10.1007/s11432-011-4378-5
21 Aggarwal C, Yu P. A framework for clustering uncertain data streams. In: Proceedings of ICDE . 2008, 150-159
22 Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of ACM SIGMOD . 2008, 819-832
23 Zhang W, Lin X, Zhang Y, Wang W, Yu J. Probabilistic skyline operator over sliding windows. In: Proceedings of ICDE . 2009, 1060-1071
24 Tran T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of SIGMOD . 2010, 159-170
25 Tran T, McGregor A, Diao Y, Peng L, Liu A. Conditioning and aggregating uncertain data streams: going beyond expectations. Proceedings of the VLDB Endowment , 2010, 3(1-2): 1302-1313
26 Soliman M, Ilyas I. Ranking with uncertain scores. In: Proceedings of ICDE . 2009, 317-328
27 Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment , 2009, 2(1): 502-513
28 Hua M, Pei J. Continuously monitoring top-k uncertain data streams: a probabilistic threshold method. Distributed and Parallel Databases , 2009, 26(1): 29-65
doi: 10.1007/s10619-009-7043-x
29 Tang M, Li F, Phillips J M, Jestes J. Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of ICDE . 2012
[1] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[2] Minqi ZHOU, Rong ZHANG, Weining QIAN, Aoying ZHOU. Distribution-free data density estimation in large-scale networks[J]. Front. Comput. Sci., 2018, 12(6): 1220-1240.
[3] Ziting ZHOU, Pengpeng ZHAO, Victor S. SHENG, Jiajie XU, Zhixu LI, Jian WU, Zhiming CUI. Efficient sampling methods for characterizing POIs on maps based on road networks[J]. Front. Comput. Sci., 2018, 12(3): 582-592.
[4] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[5] Qian LI, Gang LI, Wenjia NIU, Yanan CAO, Liang CHANG, Jianlong TAN, Li GUO. Boosting imbalanced data learning with Wiener process oversampling[J]. Front. Comput. Sci., 2017, 11(5): 836-851.
[6] Xiangxiang ZENG,Sisi YUAN,Xianxian HUANG,Quan ZOU. Identification of cytokine via an improved genetic algorithm[J]. Front. Comput. Sci., 2015, 9(4): 643-651.
[7] Lijin WANG,Yilong YIN,Yiwen ZHONG. Cuckoo search with varied scaling factor[J]. Front. Comput. Sci., 2015, 9(4): 623-635.
[8] Bo YUAN, Wenhuang LIU. Measure oriented training: a targeted approach to imbalanced classification problems[J]. Front Comput Sci, 2012, 6(5): 489-497.
[9] Tantan LIU, Fan WANG, Gagan AGRAWAL. Stratified sampling for data mining on the deep web[J]. Front Comput Sci, 2012, 6(2): 179-196.
[10] Jialu HU , Lin GAO , Guimin QIN , . Evaluation of subgraph searching algorithms for detecting network motifs in biological networks[J]. Front. Comput. Sci., 2009, 3(3): 412-416.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed