Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    2012, Vol. 6 Issue (3) : 278-292    https://doi.org/10.1007/s11704-012-2007-0
RESEARCH ARTICLE
Matching dependencies: semantics and query answering
Jaffer GARDEZI1, Leopoldo BERTOSSI2(), Iluju KIRINGA1
1. School of Information Technology and Engineering, University of Ottawa, Ottawa ON K1N 6N5, Canada; 2. School of Computer Science, Carleton University, Ottawa ON K1S 5B6, Canada
 Download: PDF(384 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforcement, an arbitrary value from the underlying data domain can be used for the value in common that is used for a matching. However, the overall number of changes of attribute values is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data cleaning through the enforcement of MDs. We characterize the intended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query answering is investigated. Tractable and intractable cases depending on the MDs are identified and characterized.

Keywords databases      data cleaning      duplicate and entity resolution      integrity constraints      matching dependencies     
Corresponding Author(s): BERTOSSI Leopoldo,Email:bertossi@scs.carleton.ca   
Issue Date: 01 June 2012
 Cite this article:   
Jaffer GARDEZI,Leopoldo BERTOSSI,Iluju KIRINGA. Matching dependencies: semantics and query answering[J]. Front Comput Sci, 2012, 6(3): 278-292.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-012-2007-0
https://academic.hep.com.cn/fcs/EN/Y2012/V6/I3/278
1 Elmagarmid A, Ipeirotis P, Verykios V. Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering , 2007, 19(1): 1-16
doi: 10.1109/TKDE.2007.250581
2 Bleiholder J, Naumann F. Data fusion. ACM Computing Surveys , 2008, 41(1): 1-41
doi: 10.1145/1456650.1456651
3 Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang S, Widom J. Swoosh: a generic approach to entity resolution. The VLDB Journal , 2009, 18: 255-276
doi: 10.1007/s00778-008-0098-x
4 Fan W. Dependencies revisited for improving data quality. In: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems . 2008, 159-170
5 Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proceedings of the VLDB Endowment , 2009, 2(1): 407-418
6 Arenas M, Bertossi L, Chomicki J. Consistent query answers in inconsistent databases. In: Proceedings of the 18th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems . 1999, 68-79
7 Bertossi L. Consistent queryanswering in databases. ACM Sigmod Record , 2006, 35(2): 68-76
doi: 10.1145/1147376.1147391
8 Chomicki J. Consistent query answering: five easy pieces. In: Proceedings of the 11th International Conference on Database Theory. LNCS 4353. Springer , 2007, 1-17
9 Bertossi L. Database repairing and consistent query answering. Synthesis Lectures on Data Management. Morgan & Claypool , 2011
10 Bertossi L, Bravo L. Consistent query answers in virtual data integration systems. In: Bertossi L, Hunter A, Schaub T, eds. Inconsistency Tolerance. LNCS 3300 . Berlin: Springer, 2005, 42-83
doi: 10.1007/978-3-540-30597-2_3
11 Bertossi L, Kolahi S, Lakshmanan L. Data cleaning and query answering with matching dependencies and matching functions. In: Proceedings of the 14th International Conference on Database Theory . 2011, 268-279
12 Gardezi J, Bertossi L, Kiringa I. Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints. In: Proceedings of the 4th International Workshop on Logic in Databases . 2011, 23-30
13 Abiteboul S, Hull R, Vianu V. Foundations of Databases. 1st edition. Addison-Wesley , 1995
14 Bertossi L, Kolahi S, Lakshmanan L. Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems , 2012 (in press)
doi: 10.1007/s00224-012-9402-7
15 Franconi1 E, Palma A, Leone N, Perri S, Scarcello F. Census data repair: A challenging application of disjunctive logic programming. In: Nieuwenhuis R, Voronkov A, eds. Logic for Programming, Artificial Intelligence, and Reasoning. LNCS 2250 . Berlin: Springer, 2001, 561-578
16 Flesca S, Furfaro F, Parisi F. Querying and repairing inconsistent numerical databases. ACM Transactions on Database Systems , 2010, 35(2): 14
doi: 10.1145/1735886.1735893
17 Wijsen J. Database repairing using updates. ACM Transactions on Database Systems , 2005, 30(3): 722-768
doi: 10.1145/1093382.1093385
18 Bertossi L, Bravo L, Franconi E, Lopatenko A. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Information Systems , 2008, 33(4): 407-434
doi: 10.1016/j.is.2008.01.005
19 Barceló P. Logical foundations of relational data exchange. ACM SIGMOD Record , 2009, 38(1): 49-58
doi: 10.1145/1558334.1558341
20 Bahmani Z, Bertossi L, Kolahi S, Lakshmanan L V S. Declarative entity resolution via matching dependencies and answer set programs. In: Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning . 2012 (in press)
21 Gardezi J, Bertossi L. Query answering under matching dependencies for data cleaning: Complexity and algorithms. Arxiv preprint arXiv: 1112. 5908 , 2011
22 Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman & Co , 1979
23 Goldreich O. Computational Complexity: A Conceptual Perspective. 1st edition. New York: Cambridge University Press, 2008
24 Papadimitriou C. Computational complexity. Addison-Wesley , 1994
25 Gelfond M, Lifschitz V. Classical negation in logic programs and disjunctive databases. New Generation Computing , 1991, 9(3): 365-385
doi: 10.1007/BF03037169
26 Brewka G, Eiter T, Truszczy’nski M. Answer set programming at a glance. Communications of the ACM , 2011, 54(12): 92-103
doi: 10.1145/2043174.2043195
27 Buccafurri F, Leone N, Rullo P. Enhancing disjunctive datalog by constraints. IEEE Transactions on Knowledge and Data Engineering , 2000, 12(5): 845-860
doi: 10.1109/69.877512
28 Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment , 2010, 3(1-2): 173-184
[1] Ran LIU, Wenjian LUO, Lihua YUE. Classifying and clustering in negative databases[J]. Front Comput Sci, 2013, 7(6): 864-874.
[2] Dantong OUYANG, Xianji CUI, Yuxin YE. Integrity constraints in OWL ontologies based on grounded circumscription[J]. Front Comput Sci, 2013, 7(6): 812-821.
[3] Jinchuan CHEN, Yueguo CHEN, Xiaoyong DU, Cuiping LI, Jiaheng LU, Suyun ZHAO, Xuan ZHOU. Big data challenge: a data management perspective[J]. Front Comput Sci, 2013, 7(2): 157-164.
[4] Xiaoming WANG, Guoxiang YAO. Access control scheme with tracing for outsourced databases[J]. Front Comput Sci, 2012, 6(6): 677-685.
[5] HAO Guoshun, MA Shilong, LV Jianghua, SUI Yuefei. Dynamic description logic model for data integration[J]. Front. Comput. Sci., 2008, 2(3): 306-330.
[6] TANG Yuanyan. Remarks on different reviews of Chinese character recognition[J]. Front. Comput. Sci., 2007, 1(2): 123-125.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed