Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (6) : 1026-1038    https://doi.org/10.1007/s11704-016-4503-0
RESEARCH ARTICLE
Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation
Ruifeng XU1,Lin GUI1,Qin LU2(),Shuai WANG1,Jian XU2
1. Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China
2. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
 Download: PDF(506 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.

Keywords Chinese person name disambiguation      Internet verification      string kernel      multi-kernel function      machine learning     
Corresponding Author(s): Qin LU   
Just Accepted Date: 17 November 2015   Online First Date: 01 June 2016    Issue Date: 11 October 2016
 Cite this article:   
Ruifeng XU,Lin GUI,Qin LU, et al. Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation[J]. Front. Comput. Sci., 2016, 10(6): 1026-1038.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-4503-0
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I6/1026
1 Chen L W, Feng Y S, Zou L, Zhao D Y. Explore person specific evidence in Web person name disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 832–842
2 Zhang B L, Huang H Z, Pan X M, Ji H, Knight K, Wen Z, Sun Y Z, Han J W, Yener B. Be appropriate and funny: automatic entity morph encoding. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014
https://doi.org/10.3115/v1/p14-2115
3 Huang H Z, Wen Z, Yu D, Ji H, Sun Y Z, Han J W, Li H. Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 1083–1093
4 Wang H F, Mei Z. Chinese multi-document personal name disambiguation. High Techlology Letters, 2005, 11(3): 280–283
5 Xu J, Lu Q, Liu Z Z. Aggregating skip bigrams into key phrase-based vector space model for Web person disambiguation. In: Proceedings of KONVENS 2012 (Main track: oral presentations). 2012, 108–117
6 Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In: Proceedings of the 33rd international ACMSIGIR Conference on Research and Development in Information Retrieval. 2010, 10–17
https://doi.org/10.1145/1835449.1835454
7 Xu J, Lu Q, Liu Z Z. Combining classification with clustering for web person disambiguation. In: Proceedings of the 21st International Conference Companion on World Wide Web. 2012, 637–638
https://doi.org/10.1145/2187980.2188165
8 Chen C, Hu J F, Wang H F. Clustering technique in multi-document personal name disambiguation. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop. 2009, 88–95
https://doi.org/10.3115/1667884.1667897
9 Chen Z, Tamang S, Lee A, Li X, Lin W P, Snover M, Artiles J, Passantino M, Ji H. Cunyblender TAC-KBP 2010 entity linking and slot filling system description. In: Proceedings of the Text Analysis Conference. 2010
10 Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
11 Radford W, Hachey B, Nothman J, Honnibal M, Curran J R. Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
12 Varma V, Bysani P, Reddy K, Reddy V B, Kovelamudi S, Vaddepally S R, Nanduri R, N K K, Gsk S, Pingali P. IIIT hyderabad in guided summarization and knowledge base guided summarization track. In: Proceedings of the Text Analysis Conference. 2010
13 Agirre E, Chang A X, Jurafsky D S, Manning C D, Spitkovsky V I, Yeh E. Stanford-UBC at TAC-KBP. In: Proceedings of Test Analysis Conference 2009. 2009
14 Li S, Gao S Y, Zhang Z Y, Li X S, Guan J Y, Xu W R, Guo J. PRIS at TAC 2009: experiments in KBP track. In: Proceedings of Test Analysis Conference 2009. 2009
15 McNamee P. HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the Text Analysis Conference. 2010
16 Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y, Cao Y, Lin C, Tan C L. I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the Text Analysis Conference. 2011
17 Han X P, Zhao J. Named entity disambiguation by leveraging wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 215–224
https://doi.org/10.1145/1645953.1645983
18 Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topicbased unsupervised name disambiguation. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. 2007, 342–351
19 Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 463–470
https://doi.org/10.1145/1060745.1060813
20 Han X P, Zhao J. Web personal name disambiguation based on reference entity tables mined from the Web. In: Proceedings of the 11th International Workshop on Web Information and Data Management. 2009, 75–82
https://doi.org/10.1145/1651587.1651605
21 Tang J T, Lu Q, Wang T, Wang J, Li W J. A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 1233–1234
https://doi.org/10.1145/2009916.2010135
22 Lang J, Qin B, Song W, Liu L, Liu T, Li S. Person name disambiguation of searching results using social network. Chinese Journal of Computers, 2009, 32(7): 1365–1374
https://doi.org/10.3724/SP.J.1016.2009.01365
23 Xu R F, Xu J, Dai X Y, Kit C. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010, 359
24 Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179–188
https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
25 Han Z H, Peng L, Sun X P. SIR-NERD: A Chinese named entity recog nition and disambiguation system using a two-stage method. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 115
26 Zong H, Wong D F, Chao L S. A template based hybrid model for Chinese personal name disambiguation. In: Proceedings of the 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing. 2012
27 Han W, Liu G, Mao Y Z, Huang Z N. Attribute based Chinese named entity recognition and disambiguation. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 127
[1]  Supplementary Material Download
[1] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[2] Yan-Ping SUN, Min-Ling ZHANG. Compositional metric learning for multi-label classification[J]. Front. Comput. Sci., 2021, 15(5): 155320-.
[3] Jian SUN, Pu-Feng DU. Predicting protein subchloroplast locations: the 10th anniversary[J]. Front. Comput. Sci., 2021, 15(2): 152901-.
[4] Syed Farooq ALI, Muhammad Aamir KHAN, Ahmed Sohail ASLAM. Fingerprint matching, spoof and liveness detection: classification and literature review[J]. Front. Comput. Sci., 2021, 15(1): 151310-.
[5] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[6] Yu-Feng LI, De-Ming LIANG. Safe semi-supervised learning: a brief introduction[J]. Front. Comput. Sci., 2019, 13(4): 669-676.
[7] Wenhao ZHENG, Hongyu ZHOU, Ming LI, Jianxin WU. CodeAttention: translating source code to comments by exploiting the code constructs[J]. Front. Comput. Sci., 2019, 13(3): 565-578.
[8] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[9] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[10] Ruochen HUANG, Xin WEI, Liang ZHOU, Chaoping LV, Hao MENG, Jiefeng JIN. A survey of data-driven approach on multimedia QoE evaluation[J]. Front. Comput. Sci., 2018, 12(6): 1060-1075.
[11] Qiang LV, Yixin CHEN, Zhaorong LI, Zhicheng CUI, Ling CHEN, Xing ZHANG, Haihua SHEN. Achieving data-driven actionability by combining learning and planning[J]. Front. Comput. Sci., 2018, 12(5): 939-949.
[12] Ashish Kumar DWIVEDI, Anand TIRKEY, Santanu Kumar RATH. Software design pattern mining using classification-based techniques[J]. Front. Comput. Sci., 2018, 12(5): 908-922.
[13] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[14] Min-Ling ZHANG, Yu-Kun LI, Xu-Ying LIU, Xin GENG. Binary relevance for multi-label learning: an overview[J]. Front. Comput. Sci., 2018, 12(2): 191-202.
[15] Zhongqing WANG, Shoushan LI, Guodong ZHOU. Personal summarization from profile networks[J]. Front. Comput. Sci., 2017, 11(6): 1085-1097.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed