Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (1) : 162-190    https://doi.org/10.1007/s11704-017-7063-z
REVIEW ARTICLE
A survey of uncertain data management
Lingli LI1, Hongzhi WANG2(), Jianzhong LI2, Hong GAO2
1. Department of Computer Science and Technology, Heilongjiang University, Heilongjiang 150001, China
2. Department of Computer Science and Technology, Harbin Institute of Technology, Heilongjiang 150001, China
 Download: PDF(466 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.

Keywords uncertain data      probabilistic database      probabilistic XML      semi-structured data      data stream     
Corresponding Author(s): Hongzhi WANG   
Just Accepted Date: 07 July 2017   Online First Date: 07 September 2018    Issue Date: 24 September 2019
 Cite this article:   
Lingli LI,Hongzhi WANG,Jianzhong LI, et al. A survey of uncertain data management[J]. Front. Comput. Sci., 2020, 14(1): 162-190.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-7063-z
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I1/162
1 N Fuhr, T Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32–66
https://doi.org/10.1145/239041.239045
2 T Imieliński, W Lipski. Incomplete information in relational databases. Journal of the ACM, 1984, 31(4): 761–791
https://doi.org/10.1145/1634.1886
3 D Barbará, H Garcia-Molina, D Porter. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487–502
https://doi.org/10.1109/69.166990
4 L V Lakshmanan, N Leone, R Ross, V S Subrahmanian. Probview: a flexible probabilistic database system. ACM Transactions on Database Systems, 1997, 22(3): 419–469
https://doi.org/10.1145/261124.261131
5 E Zimányi. Query evaluation in probabilistic relational databases. Theoretical Computer Science, 1997, 171(1): 179–219
https://doi.org/10.1016/S0304-3975(96)00129-6
6 P Sen, A Deshpande. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 596–605
https://doi.org/10.1109/ICDE.2007.367905
7 D Suciu. Probabilistic databases. SIGACT News, 2008, 39(2): 111–124
https://doi.org/10.1145/1388240.1388260
8 R Cavallo, M Pittarelli. The theory of probabilistic databases. In: Proceedings of the 13th International Conference on Very Large Data Bases. 1987, 71–81
9 O Benjelloun, A D Sarma, A Halevy, J Widom. ULDBS: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 953–964
10 P Sen, A Deshpande, L Getoor. Read-once functions and query evaluation in probabilistic databases. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1068–1079
https://doi.org/10.14778/1920841.1920975
11 D Olteanu, J Huang. Using OBDDs for efficient query evaluation on probabilistic databases. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 326–340
https://doi.org/10.1007/978-3-540-87993-0_26
12 S Roy, V Perduca, V Tannen. Faster query answering in probabilistic databases using read-once functions. In: Proceedings of the 14th International Conference on Database Theory. 2011, 232–243
https://doi.org/10.1145/1938551.1938582
13 B Kenig, A Gal, O Strichman. A new class of lineage expressions over probabilistic databases computable in P-time. In: Proceedings of the 7th International Conference on Scalable Uncertainty Management. 2013, 219–232
https://doi.org/10.1007/978-3-642-40381-1_17
14 J Widom. Trio: a system for integrated management of data, accuracy, and lineage. Stanford Infolab, 2004
15 L Antova, C Koch, D Olteanu. Maybms: managing incomplete information with probabilistic world-set decompositions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 1479–1480
https://doi.org/10.1109/ICDE.2007.369042
16 R Cheng, S Singh, S Prabhakar. U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 1271–1274
17 J Boulos, N Dalvi, B Mandhani, S Mathur, C Re, D Suciu. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 891–893
https://doi.org/10.1145/1066157.1066277
18 D Olteanu, J Huang, C Koch. Sprout: lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 640–651
https://doi.org/10.1109/ICDE.2009.123
19 B Kimelfeld, Y Kosharovsky, Y Sagiv. Query evaluation over probabilistic XML. The International Journal on Very Large Data Bases, 2009, 18(5): 1117–1140
https://doi.org/10.1007/s00778-009-0150-5
20 P Senellart, A Souihli. Proapprox: a lightweight approximation query processor over probabilistic trees. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1295–1298
https://doi.org/10.1145/1989323.1989480
21 E Welbourne, N Khoussainova, J Letchner, Y Li, M Balazinska, G Borriello, D Suciu. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
https://doi.org/10.1145/1378600.1378631
22 T T Tran, L Peng, B Li, Y Diao, A Liu. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 159–170
https://doi.org/10.1145/1807167.1807187
23 T T Tran, L Peng, Y Diao, A McGregor, A Liu. Claro: modeling and processing uncertain data streams. The International Journal on Very Large Data Bases, 2012, 21(5): 651–676
https://doi.org/10.1007/s00778-011-0261-7
24 C C Aggarwal, P S Yu. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609–623
https://doi.org/10.1109/TKDE.2008.190
25 A Y Zhou. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1–16
https://doi.org/10.3724/SP.J.1016.2009.00001
26 B Kimelfeld, P Senellart. Probabilistic XML: Models and Complexity. Advances in Probabilistic Databases for Uncertain Information Management, Springer, Berlin, Heidelberg, 2013, 39–66
https://doi.org/10.1007/978-3-642-37509-5_3
27 A D Sarma, O Benjelloun, A Halevy, J Widom. Working models for uncertain data. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 7
https://doi.org/10.1109/ICDE.2006.174
28 T J Green, V Tannen. Models for incomplete and probabilistic information. In: Proceedings of the International Conference on Extending Database Technology. 2006, 278–296
https://doi.org/10.1007/11896548_24
29 P Sen, A Deshpande, L Getoor. PRDB: managing and exploiting rich correlations in probabilistic databases. The International Journal on Very Large Data Bases, 2009, 18(5): 1065–1090
https://doi.org/10.1007/s00778-009-0153-2
30 R Chen, Y Mao, I Kiringa. GRN model of probabilistic databases: construction, transition and querying. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 291–302
https://doi.org/10.1145/1807167.1807201
31 R Cheng, Y Xia, S Prabhakar, R Shah, J S Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 876–887
https://doi.org/10.1016/B978-012088469-8.50077-2
32 Y Tao, R Cheng, X Xiao, W K Ngai, B Kao, S Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 922–933
33 D Burdick, P M Deshpande, T Jayram, R Ramakrishnan, S Vaithyanathan. Olap over uncertain and imprecise data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 970–981
34 T Jayram, S Kale, E Vee. Efficient aggregation algorithms for probabilistic data. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2007, 346–355
35 N Dalvi, D Suciu. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 864–875
https://doi.org/10.1016/B978-012088469-8.50076-0
36 G Cormode, M Garofalakis. Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD International Conference onManagement of Data. 2007, 281–292
https://doi.org/10.1145/1247480.1247513
37 R Ross, V Subrahmanian, J Grant. Aggregate operators in probabilistic databases. Journal of the ACM, 2005, 52(1): 54–101
https://doi.org/10.1145/1044731.1044734
38 B Kanagal, A Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1315–1318
39 D Burdick, P M Deshpande, T Jayram, R Ramakrishnan, S Vaithyanathan. Efficient allocation algorithms for olap over imprecise data. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 391–402
40 C Ré, D Suciu. The trichotomy of having queries on a probabilistic database. The International Journal on Very Large Data Bases, 2009, 18(5): 1091–1116
https://doi.org/10.1007/s00778-009-0151-4
41 R Fink, L Han, D Olteanu. Aggregation in probabilistic databases via knowledge compilation. Proceedings of the VLDB Endowment, 2012, 5(5): 490–501
https://doi.org/10.14778/2140436.2140445
42 W K Ngai, B Kao, C K Chui, R Cheng, M Chau, K Y Yip. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining. 2006, 436–445
https://doi.org/10.1109/ICDM.2006.63
43 P Agrawal, J Widom. Confidence-aware join algorithms. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 628–639
https://doi.org/10.1109/ICDE.2009.141
44 R Cheng, S Singh, S Prabhakar, R Shah, J S Vitter, Y Xia. Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 738–747
https://doi.org/10.1145/1183614.1183719
45 H P Kriegel, P Kunath, M Pfeifle, M Renz. Probabilistic similarity join on uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2006, 295–309
https://doi.org/10.1007/11733836_22
46 V Ljosa, A K Singh. Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 566–575
https://doi.org/10.1109/ICDE.2008.4497465
47 J Jestes, F Li, Z Yan, K Yi. Probabilistic string similarity joins. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 327–338
https://doi.org/10.1145/1807167.1807204
48 X Lian, L Chen. Set similarity join on probabilistic data. Proceedings of the VLDB Endowment, 2010, 3(1–2): 650–659
https://doi.org/10.14778/1920841.1920924
49 P Andritsos, A Fuxman, R J Miller. Clean answers over dirty databases: probabilistic approach. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 30
https://doi.org/10.1109/ICDE.2006.35
50 M Wick, A McCallum, G Miklau. Scalable probabilistic databases with factor graphs and mcmc. Proceedings of the VLDB Endowment, 2010, 3(1–2): 794–804
https://doi.org/10.14778/1920841.1920942
51 Y Qi, R Jain, S Singh, S Prabhakar. Threshold query optimization for uncertain data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 315–326
https://doi.org/10.1145/1807167.1807203
52 K F Moore, V Rastogi, C Ré, D Suciu. Query containment of tier-2 queries over a probabilistic database. In: Proceedings of the VLDB Workshop on Management of Uncertain Data. 2010, 47–62
53 T Ge, D Grabiner, S Zdonik. Monte carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 936–947
https://doi.org/10.1109/ICDE.2011.5767887
54 M A Soliman, I F Ilyas, K C C Chang. Top-k query processing in uncertain databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 896–905
https://doi.org/10.1109/ICDE.2007.367935
55 K Yi, F Li, G Kollios, D Srivastava. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1669–1682
https://doi.org/10.1109/TKDE.2008.90
56 Y K Huang, C C Chen, C Lee. Continuous k-nearest neighbor query for moving objects with uncertain velocity. GeoInformatica, 2009, 13(1): 1–25
https://doi.org/10.1007/s10707-007-0041-0
57 X Zhang, J Chomicki. Semantics and evaluation of top-k queries in probabilistic databases. Distributed and Parallel Databases, 2009, 26(1): 67–126
https://doi.org/10.1007/s10619-009-7050-y
58 M Hua, J Pei, W Zhang, X Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 673–686
https://doi.org/10.1145/1376616.1376685
59 G Cormode, F Li, K Yi. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 305–316
https://doi.org/10.1109/ICDE.2009.75
60 T Ge, S Zdonik, S Madden. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. 2009, 375–388
https://doi.org/10.1145/1559845.1559886
61 M A Soliman, I F Ilyas. Ranking with uncertain scores. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 317–328
https://doi.org/10.1109/ICDE.2009.102
62 J Li, A Deshpande. Ranking continuous probabilistic datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 638–649
https://doi.org/10.14778/1920841.1920923
63 R Cheng, J Chen, M Mokbel, C Y Chow. Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 973–982
https://doi.org/10.1109/ICDE.2008.4497506
64 R Cheng, L Chen, J Chen, X Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 672–683
https://doi.org/10.1145/1516360.1516438
65 Y Zhang, X Lin, G Zhu, W Zhang, Q Lin. Efficient rank based knn query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 28–39
https://doi.org/10.1109/ICDE.2010.5447874
66 X Lian, L Chen. Probabilistic group nearest neighbor queries in uncertain databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 809–824
https://doi.org/10.1109/TKDE.2008.41
67 S M Yuen, Y Tao, X Xiao, J Pei, D Zhang. Superseding nearest neighbor search on uncertain spatial databases. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(7): 1041–1055
https://doi.org/10.1109/TKDE.2009.137
68 M A Cheema, X Lin, W Wang, W Zhang, J Pei. Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 550–564
https://doi.org/10.1109/TKDE.2009.108
69 X Lian, L Chen. Probabilistic inverse ranking queries in uncertain databases. The International Journal on Very Large Data Bases, 2011, 20(1): 107–127
https://doi.org/10.1007/s00778-010-0195-5
70 X Lian, L Chen. Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. The International Journal on Very Large Data Bases, 2009, 18(3): 787–808
https://doi.org/10.1007/s00778-008-0123-0
71 J Pei, B Jiang, X Lin, Y Yuan. Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 15–26
72 Y Yuan, G Wang. Answering probabilistic reachability queries over uncertain graphs. Chinese Journal of Computers, 2010, 33(8): 1378–1386
https://doi.org/10.3724/SP.J.1016.2010.01378
73 X Lian, L Chen. Top-k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 660–671
https://doi.org/10.1145/1516360.1516437
74 E Grädel, Y Gurevich, C Hirsch. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 1998, 227–234
https://doi.org/10.1145/275487.295124
75 N Dalvi, D Suciu. The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGMODSIGACT- SIGART Symposium on Principles of Database Systems. 2007, 293–302
https://doi.org/10.1145/1265530.1265571
76 R Fagin, A Lotem, M Naor. Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2001, 102–113
https://doi.org/10.1145/375551.375567
77 J Li, B Saha, A Deshpande. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment, 2009, 2(1): 502–513
https://doi.org/10.14778/1687627.1687685
78 F Li, K Yi, J Jestes. Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 361–374
https://doi.org/10.1145/1559845.1559885
79 X Dai, M L Yiu, N Mamoulis, Y Tao, M Vaitis. Probabilistic spatial queries on existentially uncertain data. Advances in Spatial and Temporal Databases, 2005, 400–417
https://doi.org/10.1007/11535331_23
80 M L Yiu, N Mamoulis, X Dai, Y Tao, M Vaitis. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(1): 108–122
https://doi.org/10.1109/TKDE.2008.135
81 R Cheng, D V Kalashnikov, S Prabhakar. Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 2003, 551–562
https://doi.org/10.1145/872757.872823
82 H P Kriegel, P Kunath, M Renz. Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2007, 337–348
https://doi.org/10.1007/978-3-540-71703-4_30
83 X Lian, L Chen. Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2009, 35–50
https://doi.org/10.1007/978-3-642-00887-0_4
84 X Lian, L Chen. Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 213–226
https://doi.org/10.1145/1376616.1376641
85 Y Tao, X Xiao, R Cheng. Range search on multidimensional uncertain data. ACM Transactions on Database Systems, 2007, 32(3): 15
https://doi.org/10.1145/1272743.1272745
86 C Bohm, A Pryakhin, M Schubert. The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 9
https://doi.org/10.1109/ICDE.2006.159
87 V Ljosa, A K Singh. APLA: indexing arbitrary probability distributions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 946–955
https://doi.org/10.1109/ICDE.2007.367940
88 R Cheng, X Xie, M L Yiu, J Chen, L Sun. UV-diagram: a voronoi diagram for uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 796–807
https://doi.org/10.1109/ICDE.2010.5447917
89 F Angiulli, F Fassetti. Indexing uncertain data in general metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9): 1640–1657
https://doi.org/10.1109/TKDE.2011.93
90 S Singh, C Mayfield, S Prabhakar, R Shah, S Hambrusch. Indexing uncertain categorical data. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 616–625
https://doi.org/10.1109/ICDE.2007.367907
91 B Kanagal, A Deshpande. Indexing correlated probabilistic databases. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 455–468
https://doi.org/10.1145/1559845.1559894
92 M Chau, R Cheng, B Kao, J Ng. Uncertain data mining: an example in clustering location data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2006, 199–204
https://doi.org/10.1007/11731139_24
93 Y Li, J Han, J Yang. Clustering moving objects. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 617–622
https://doi.org/10.1145/1014052.1014129
94 S D Lee, B Kao, R Cheng. Reducing UK-means to K-means. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 483–488
https://doi.org/10.1109/ICDMW.2007.40
95 B Kao, S D Lee, D W Cheung, W S Ho, K Chan. Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th International Conference on Data Mining. 2008, 333–342
https://doi.org/10.1109/ICDM.2008.31
96 F Dehne, H Noltemeier. Voronoi trees and clustering problems. Information Systems, 1987, 12(2): 171–175
https://doi.org/10.1016/0306-4379(87)90041-X
97 F Gullo, G Ponti, A Tagarelli. Clustering uncertain data via Kmedoids. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 229–242
https://doi.org/10.1007/978-3-540-87993-0_19
98 G Cormode, A McGregor. Approximation algorithms for clustering uncertain data. In: Proceedings of the 27th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2008, 191–200
https://doi.org/10.1145/1376916.1376944
99 H P Kriegel, M Pfeifle. Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677
https://doi.org/10.1145/1081870.1081955
100 H P Kriegel, M Pfeifle. Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 4
https://doi.org/10.1109/ICDM.2005.75
101 H Xu, G Li. Density-based probabilistic clustering of uncertain data. In: Proceedings of the International Conference on Computer Science and Software Engineering. 2008, 474–477
https://doi.org/10.1109/CSSE.2008.968
102 H Hamdan, G Govaert. Mixture model clustering of uncertain data. In: Proceedings of the 14th IEEE International Conference on Fuzzy Systems. 2005, 879–884
https://doi.org/10.1109/FUZZY.2005.1452510
103 L Xiao, E Hung. An efficient distance calculation method for uncertain objects. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. 2007, 10–17
https://doi.org/10.1109/CIDM.2007.368846
104 J Bi, T Zhang. Support vector classification with input data uncertainty. Advances in Neural Information Processing Systems, 2004, 17: 161–169
105 C Bhattacharyya, K Pannagadatta, A J Smola. A second order cone programming formulation for classifying missing data. Advances in Neural Information Processing Systems, 2005, 17: 153–160
106 J Yang, S Gunn. Exploiting uncertain data in support vector classification. In: Proceedings of the International Conference on Knowledge– Based Intelligent Information and Engineering Systems. 2007, 148–155
https://doi.org/10.1007/978-3-540-74829-8_19
107 J Yang, S Gunn. Iterative constraints in support vector classification with uncertain information. Constraint-based Mining and Learning, 2007, 49
108 F Demichelis, P Magni, P Piergiorgi, M A Rubin, R Bellazzi. A hierarchical naive bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics, 2006, 7(1): 514
https://doi.org/10.1186/1471-2105-7-514
109 C K Chui, B Kao, E Hung. Mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2007, 47–58
https://doi.org/10.1007/978-3-540-71701-0_8
110 C K Chui, B Kao. A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2008, 64–75
https://doi.org/10.1007/978-3-540-68125-0_8
111 C S Leung, C L Carmichael, B Hao. Efficient mining of frequent patterns from uncertain data. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 489–494
https://doi.org/10.1109/ICDMW.2007.84
112 C K S Leung, D A Brajczuk. Efficient mining of frequent itemsets from data streams. In: Proceedings of the British National Conference on Databases. 2008, 2–14
https://doi.org/10.1007/978-3-540-70504-8_2
113 C K S Leung, M A F Mateo, D A Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2008, 653–661
https://doi.org/10.1007/978-3-540-68125-0_61
114 K Hewawasam, K Premaratne, S Subasingha, M L Shyu. Rule mining and classification in imperfect databases. In: Proceedings of the 8th International Conference on Information Fusion. 2005, 661–668
https://doi.org/10.1109/ICIF.2005.1591917
115 M A B Tobji, B B Yaghlane, K Mellouli. A new algorithm for mining frequent itemsets from evidential databases. Proceedings of Information Processing and Management of Uncertainty. 2008, 8: 1535–1542
116 M A B Tobji, B B Yaghlane, K Mellouli. Frequent itemset mining from databases including one evidential attribute. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 19–32
https://doi.org/10.1007/978-3-540-87993-0_4
117 S Abiteboul, B Kimelfeld, Y Sagiv, P Senellart. On the expressiveness of probabilistic XML models. The International Journal on Very Large Data Bases, 2009, 18(5): 1041–1064
https://doi.org/10.1007/s00778-009-0146-1
118 T Li, Q Shao, Y Chen. PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 848–849
https://doi.org/10.1145/1183614.1183761
119 A Nierman, H Jagadish. ProTDB: probabilistic data in XML. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 2002, 646–657
https://doi.org/10.1016/B978-155860869-6/50063-9
120 S Abiteboul, P Senellart. Querying and updating probabilistic information in XML. In: Proceedings of the International Conference on Extending Database Technology. 2006, 1059–1068
https://doi.org/10.1007/11687238_62
121 P Senellart, S Abiteboul. On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2007, 283–292
https://doi.org/10.1145/1265530.1265570
122 E Hung, L Getoor, V Subrahmanian. Probabilistic interval XML. In: Proceedings of International Conference on Database Theory. 2003, 361–377
https://doi.org/10.1007/3-540-36285-1_24
123 E Hung, L Getoor, V Subrahmanian. PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 467–478
https://doi.org/10.1109/ICDE.2003.1260814
124 S Abiteboul, T H H Chan, E Kharlamov, W Nutt, P Senellart. Aggregate queries for discrete and continuous probabilistic XML. In: Proceedings of the 13th International Conference on Database Theory. 2010, 50–61
https://doi.org/10.1145/1804669.1804679
125 B Kimelfeld, Y Kosharovsky, Y Sagiv. Query efficiency in probabilistic XML models. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 701–714
https://doi.org/10.1145/1376616.1376687
126 W Zhao, A Dekhtyar, J Goldsmith. Databases for interval probabilities. International Journal of Intelligent Systems, 2004, 19(9): 789–815
https://doi.org/10.1002/int.20025
127 W Zhao, A Dekhtyar, J Goldsmith. A framework for management of semistructured probabilistic data. Journal of Intelligent Information Systems, 2005, 25(3): 293–332
https://doi.org/10.1007/s10844-005-0197-8
128 A Dekhtyar, J Goldsmith, S R Hawkes. Semistructured probabilistic databases. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management. 2001, 36–45
https://doi.org/10.1109/SSDM.2001.938536
129 E Hung. Managing uncertainty and ontologies in databases. UMD Theses and Dissertations, 2005
130 M Magnani, D Montesi. Management of interval probabilistic data. Acta Informatica, 2008, 45(2): 93–130
https://doi.org/10.1007/s00236-007-0065-9
131 S Cohen, B Kimelfeld, Y Sagiv. Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2008, 109–118
https://doi.org/10.1145/1376916.1376933
132 B Kimelfeld, Y Sagiv. Matching twigs in probabilistic XML. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 27–38
133 E Adar, C Ré. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15–22
134 P Hintsanen. The most reliable subgraph problem. In: Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 471–478
https://doi.org/10.1007/978-3-540-74976-9_48
135 P Hintsanen, H Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Mining and Knowledge Discovery, 2008, 17(1): 3–23
https://doi.org/10.1007/s10618-008-0106-1
136 Z Zou, J Li, H Gao, S Zhang. Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 583–592
https://doi.org/10.1145/1645953.1646028
137 Z Zou, J Li, H Gao, S Zhang. Mining frequent subgraph patterns from uncertain graphs. Journal of Software, 2009, 20(11): 2965–2976
https://doi.org/10.3724/SP.J.1001.2009.03473
138 Z Zou, J Li, H Gao, S Zhang. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203–1218
https://doi.org/10.1109/TKDE.2010.80
139 M Potamias, F Bonchi, A Gionis, G Kollios. K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, 2010, 3(1–2): 997–1008
https://doi.org/10.14778/1920841.1920967
140 Y Yuan, L Chen, G Wang. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2010, 155–170
https://doi.org/10.1007/978-3-642-12026-8_14
141 O Papapetrou, E Ioannou, D Skoutas. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 355–366
https://doi.org/10.1145/1951365.1951408
142 M Han, W Zhang, J Z Li. Raking: an efficient k-maximal frequent pattern mining algorithm on uncertain graph database. Chinese Journal of Computers, 2010, 33(8): 1387–1395
https://doi.org/10.3724/SP.J.1016.2010.01387
143 Z Zou, H Gao, J Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 633–642
https://doi.org/10.1145/1835804.1835885
144 Z Zou, J Li, H Gao, S Zhang. Finding top-k maximal cliques in an uncertain graph. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 649–652
https://doi.org/10.1109/ICDE.2010.5447891
145 Y Yuan, G Wang, H Wang, L Chen. Efficient subgraph search over large uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 876–886
146 Y Yuan, G Wang, L Chen, H Wang. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800–811
https://doi.org/10.14778/2311906.2311908
147 M Koyutürk, A Grama, W Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics. 2004, 20(Suppl 1): 200–207
https://doi.org/10.1093/bioinformatics/bth919
148 L G Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 1979, 8(3): 410–421
https://doi.org/10.1137/0208032
149 C Jin, K Yi, L Chen, J X Yu, X Lin. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312
https://doi.org/10.14778/1453856.1453892
150 C Ré, J Letchner, M Balazinksa, D Suciu. Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 715–728
https://doi.org/10.1145/1376616.1376688
151 N Alon, Y Matias, M Szegedy. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 1996, 20–29
https://doi.org/10.1145/237814.237823
152 P Flajolet, G N Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
https://doi.org/10.1016/0022-0000(85)90041-8
153 T Zhang, R Ramakrishnan, M Livny. Birch: an efficient data clustering method for very large databases. ACM Sigmod Record, 1996, 25(2): 103–114
https://doi.org/10.1145/235968.233324
154 C C Aggarwal, J Han, J Wang, P S Yu. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. VLDB Endowment, 2003, 81–92
https://doi.org/10.1016/B978-012722442-8/50016-1
155 C C Aggarwal, P S Yu. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 150–159
https://doi.org/10.1109/ICDE.2008.4497423
156 Z Li, T Ge. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of the International Conference on Management of Data. 2012, 277–288
https://doi.org/10.1145/2213836.2213868
157 X Lian, L Chen. Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 857–866
https://doi.org/10.1145/1645953.1646062
158 T Ge, F Liu. Accuracy-aware uncertain stream databases. In: Proceedings of the 28th International Conference on Data Engineering. 2012, 174–185
https://doi.org/10.1109/ICDE.2012.96
159 L Peng, Y Diao, A Liu. Optimizing probabilistic query processing on continuous uncertain data. Proceedings of the VLDB Endowment, 2011, 4(11): 1169–1180
160 T Jayram, A McGregor, S Muthukrishnan, E Vee. Estimating statistical aggregates on probabilistic data streams. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2007, 243–252
https://doi.org/10.1145/1265530.1265565
161 Q Zhang, F Li, K Yi. Finding frequent items in probabilistic data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 819–832
https://doi.org/10.1145/1376616.1376698
162 C C Aggarwal, J Han, J Wang, S Y Philip. On high dimensional projected clustering of data streams. Data Mining and Knowledge Discovery, 2005, 10(3): 251–273
https://doi.org/10.1007/s10618-005-0645-7
163 C Zhang, M Gao, A Zhou. Tracking high quality clusters over uncertain data streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1641–1648
https://doi.org/10.1109/ICDE.2009.160
164 W Zhang, X Lin, Y Zhang, W Wang, G Zhu, J Xu Yu. Probabilistic skyline operator over sliding windows. Information Systems, 2013, 38(8): 1212–1233
https://doi.org/10.1016/j.is.2012.03.002
165 S Subramaniam, T Palpanas, D Papadopoulos, V Kalogeraki, D Gunopulos. Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 187–198
166 A Deshpande, C Guestrin, S R Madden, J M Hellerstein, W Hong. Model-driven data acquisition in sensor networks. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 588–599
https://doi.org/10.1016/B978-012088469-8.50053-X
167 Y Hida, P Huang, R Nishtala. Aggregation query under uncertainty in sensor networks. Technical Report, 2004
168 E Welbourne, N Khoussainova, J Letchner, Y Li, M Balazinska, G Borriello, D Suciu. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
https://doi.org/10.1145/1378600.1378631
169 B Kanagal, A Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 1160–1169
https://doi.org/10.1109/ICDE.2008.4497525
170 C J Zhang, L Chen, Y Tong, Z Liu. Cleaning uncertain data with a noisy crowd. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 6–17
https://doi.org/10.1109/ICDE.2015.7113268
171 L Mo, R Cheng, X Li, D W Cheung, X S Yang. Cleaning uncertain data for top-k queries. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 134–145
172 F Panse, M Van Keulen, A De Keijzer, N Ritter. Duplicate detection in probabilistic data. In: Proceedings of the 26th International Conference on Data Engineering Workshops. 2010, 179–182
https://doi.org/10.1109/ICDEW.2010.5452759
173 M Van Keulen, A De Keijzer. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. Proceedings of the VLDB Endowment, 2009, 18(5): 1191–1217
https://doi.org/10.1007/s00778-009-0156-z
174 R Cheng, J Chen, X Xie. Cleaning uncertain data with quality guarantees. Proceedings of the VLDB Endowment, 2008, 1(1): 722–735
https://doi.org/10.14778/1453856.1453935
175 X L Dong, A Halevy, C Yu. Data integration with uncertainty. Proceedings of the VLDB Endowment, 2009, 18(2): 469–500
https://doi.org/10.1007/s00778-008-0119-9
[1] Lizhen WANG,Jun HAN,Hongmei CHEN,Junli LU. Top-k probabilistic prevalent co-location mining in spatially uncertain data sets[J]. Front. Comput. Sci., 2016, 10(3): 488-503.
[2] Amir JAHANGARD-RAFSANJANI,Seyed-Hassan MIRIAN-HOSSEINABADI. A model-driven approach to semi-structured database design[J]. Front. Comput. Sci., 2015, 9(2): 237-252.
[3] Min XU,Hisao ISHIBUCHI,Xin GU,Shitong WANG. Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams[J]. Front. Comput. Sci., 2014, 8(4): 563-580.
[4] Cheqing JIN, Jingwei ZHANG, Aoying ZHOU. Continuous ranking on uncertain streams[J]. Front Comput Sci, 2012, 6(6): 686-699.
[5] Robert GWADERA. Multi-stream join answering for mining significant cross-stream correlations[J]. Front Comput Sci, 2012, 6(2): 131-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed