Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (4) : 688-701    https://doi.org/10.1007/s11704-016-5265-4
RESEARCH ARTICLE
Discovering context-aware conditional functional dependencies
Yuefeng DU(), Derong SHEN, Tiezheng NIE, Yue KOU, Ge YU
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
 Download: PDF(603 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect potential errors. Especially, we focus on automatically discovering minimal CCFDs. In this paper, we present context relativity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs.Moreover,we prove that discovering minimal CCFDs is NP-complete and we design the precise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity according to data distribution for suggestion. The repairing results are approvedmore accuracy, even evidenced by our empirical evaluation.

Keywords conditional functional dependencies      context aware      rules discovery     
Corresponding Author(s): Yuefeng DU   
Just Accepted Date: 16 March 2016   Online First Date: 17 March 2017    Issue Date: 26 July 2017
 Cite this article:   
Yuefeng DU,Derong SHEN,Tiezheng NIE, et al. Discovering context-aware conditional functional dependencies[J]. Front. Comput. Sci., 2017, 11(4): 688-701.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5265-4
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I4/688
1 BittonD, Millman J, TorgersenS . A feasibility and performance study of dependency inference (database design). In: Proceedings of the 5th International Conference on Data Engineering. 1989, 635–641
https://doi.org/10.1109/icde.1989.47271
2 AbiteboulS, HullR, VianuV. Foundations of Databases. Boston: Addison-Wesley, 1995
3 KivinenJ, Mannila H. Approximate inference of functional dependencies from relations. Theoretical Computer Science, 1995, 149(1): 129–149
https://doi.org/10.1016/0304-3975(95)00028-U
4 MaherM. Constrained dependencies. Theoretical Computer Science, 1997, 173(1): 113–149
https://doi.org/10.1016/S0304-3975(96)00193-4
5 FanW F, GeertsF, JiaX B, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Transactions on Database Systems (TODS), 2008, 33(2): 1–44
https://doi.org/10.1145/1366102.1366103
6 FanW F, GeertsF. Foundations of Data Quality Management. San Rafael, Calif: Morgan and Claypool, 2012
7 BravoL, FanW F, GeertsF, Ma S. Increasing the expressivity of conditional functional dependencies without extra complexity. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 516–525
https://doi.org/10.1109/icde.2008.4497460
8 RamanV, Hellerstein J. Potter’s wheel: an interactive data cleaning system. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 381–390
9 IlyasI, MarklV, HaasP, Brown P, AboulnagaA . Cords: automatic discovery of correlations and soft functional dependencies. In: Proceedings of the 30th ACM SIGMOD International Conference on Management of Data. 2004, 647–658
https://doi.org/10.1145/1007568.1007641
10 MayfieldC, Neville J, PrabhakarS . Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 36th ACM SIGMOD International Conference on Management of Data. 2010, 75–86
https://doi.org/10.1145/1807167.1807178
11 DallachiesaM, EbaidA, EldawyA, Elmagarmid A, IlyasI , OuzzaniM, TangN. Nadeef: a commodity data cleaning system. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 541–552
https://doi.org/10.1145/2463676.2465327
12 BohannonP, FanW F, FlasterM, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 31st ACM SIGMOD International Conference on Management of Data. 2005, 143–154
https://doi.org/10.1145/1066157.1066175
13 MaS, FanW F, BravoL. Extending inclusion dependencies with conditions. Theoretical Computer Science, 2014, 515: 64–95
https://doi.org/10.1016/j.tcs.2013.11.002
14 FanW F, GeertsF, LiJ Z, Xiong M. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5): 683–698
https://doi.org/10.1109/TKDE.2010.154
15 CormenH, Leiserson C, RivestR , SteinC. Introduction to algorithms. Cambridge: MIT Press, 2001
16 CongG, FanW F, GeertsF, Jia X, MaS . Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 315–326
17 ChuX, IlyasI, PapottiP. Discovering denial constraints. Proceedings of the VLDB Endowment, 2013, 6(13): 1498–1509
https://doi.org/10.14778/2536258.2536262
18 FanW F, GeertsF, TangN, Yu W Y. Inferring data currency and consistency for conflict resolution. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 470–481
19 CaoY, FanW F, YuW Y. Determining the relative accuracy of attributes. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 565–576
https://doi.org/10.1145/2463676.2465309
20 HaasL, Hernández M, HoH , PopaL, RothM. Clio grows up: from research prototype to industrial tool. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 805–810
https://doi.org/10.1145/1066157.1066252
21 MaS, DuanL, FanW F, Hu C, ChenW G . Extending conditional dependencies with built-in predicates. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(12): 3274–3288
22 ChenW G, FanW F, MaS. Incorporating cardinality constraints and synonym rules into conditional functional dependencies. Information Processing Letters, 2009, 109(14): 783–789
https://doi.org/10.1016/j.ipl.2009.03.021
23 HuhtalaY, Kärkkäinen J, PorkkaP , ToivonenH. Tane: an efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999, 42(2): 100–111
https://doi.org/10.1093/comjnl/42.2.100
24 HuhtalaY, Karkkainen J, PorkkaP , ToivonenH. Efficient discovery of functional and approximate dependencies using partitions. In: Proceedings of the 4th International Conference on Data Engineering. 1998, 392–401
https://doi.org/10.1109/ICDE.1998.655802
25 ChiangF, MillerR. Discovering data quality rules. Proceedings of the VLDB Endowment, 2008, 1(1): 1166–1177
https://doi.org/10.14778/1453856.1453980
26 ChiangF, MillerR.A unified model for data and constraint repair. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 446–457
https://doi.org/10.1109/icde.2011.5767833
27 FanW F, MaS, TangN, Yu W Y. Interaction between record matching and data repairing. Journal of Data and Information Quality (JDIQ), 2014, 4(4): 1–16
https://doi.org/10.1145/2567657
28 WangJ N, TangN. Towards dependable data repairing with fixing rules. In: Proceedings of the 40th ACM SIGMOD International Conference on Management of Data. 2014, 457–468
https://doi.org/10.1145/2588555.2610494
29 InterlandiM, TangN. Proof positive and negative in data cleaning. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 18–29
https://doi.org/10.1109/icde.2015.7113269
[1] FCS-0688-15265-XFD_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed