Front. Comput. Sci.    2017, Vol. 11 Issue (1) : 147-159
A genetic algorithm based entity resolution approach with active learning
Chenchen SUN(),Derong SHEN,Yue KOU,Tiezheng NIE,Ge YU
Sohool of Information Science and Engineering, Northeastern University, Shenyang 110819, China
Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing approaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based entity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes’ comparisons with proper thresholds. We use active learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity resolution approaches in accuracy.

Keywords entity resolution      genetic algorithm      active learning      data quality      data integration     
Corresponding Author(s): Chenchen SUN   
Just Accepted Date: 02 November 2015   Online First Date: 19 April 2016    Issue Date: 11 January 2017
Chenchen SUN,Derong SHEN,Yue KOU, et al. A genetic algorithm based entity resolution approach with active learning[J]. Front. Comput. Sci., 2017, 11(1): 147-159.
