|
|
Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation |
Ruifeng XU1,Lin GUI1,Qin LU2( ),Shuai WANG1,Jian XU2 |
1. Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China 2. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China |
|
|
Abstract The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.
|
Keywords
Chinese person name disambiguation
Internet verification
string kernel
multi-kernel function
machine learning
|
Corresponding Author(s):
Qin LU
|
Just Accepted Date: 17 November 2015
Online First Date: 01 June 2016
Issue Date: 11 October 2016
|
|
1 |
Chen L W, Feng Y S, Zou L, Zhao D Y. Explore person specific evidence in Web person name disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 832–842
|
2 |
Zhang B L, Huang H Z, Pan X M, Ji H, Knight K, Wen Z, Sun Y Z, Han J W, Yener B. Be appropriate and funny: automatic entity morph encoding. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014
https://doi.org/10.3115/v1/p14-2115
|
3 |
Huang H Z, Wen Z, Yu D, Ji H, Sun Y Z, Han J W, Li H. Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 1083–1093
|
4 |
Wang H F, Mei Z. Chinese multi-document personal name disambiguation. High Techlology Letters, 2005, 11(3): 280–283
|
5 |
Xu J, Lu Q, Liu Z Z. Aggregating skip bigrams into key phrase-based vector space model for Web person disambiguation. In: Proceedings of KONVENS 2012 (Main track: oral presentations). 2012, 108–117
|
6 |
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In: Proceedings of the 33rd international ACMSIGIR Conference on Research and Development in Information Retrieval. 2010, 10–17
https://doi.org/10.1145/1835449.1835454
|
7 |
Xu J, Lu Q, Liu Z Z. Combining classification with clustering for web person disambiguation. In: Proceedings of the 21st International Conference Companion on World Wide Web. 2012, 637–638
https://doi.org/10.1145/2187980.2188165
|
8 |
Chen C, Hu J F, Wang H F. Clustering technique in multi-document personal name disambiguation. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop. 2009, 88–95
https://doi.org/10.3115/1667884.1667897
|
9 |
Chen Z, Tamang S, Lee A, Li X, Lin W P, Snover M, Artiles J, Passantino M, Ji H. Cunyblender TAC-KBP 2010 entity linking and slot filling system description. In: Proceedings of the Text Analysis Conference. 2010
|
10 |
Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
|
11 |
Radford W, Hachey B, Nothman J, Honnibal M, Curran J R. Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
|
12 |
Varma V, Bysani P, Reddy K, Reddy V B, Kovelamudi S, Vaddepally S R, Nanduri R, N K K, Gsk S, Pingali P. IIIT hyderabad in guided summarization and knowledge base guided summarization track. In: Proceedings of the Text Analysis Conference. 2010
|
13 |
Agirre E, Chang A X, Jurafsky D S, Manning C D, Spitkovsky V I, Yeh E. Stanford-UBC at TAC-KBP. In: Proceedings of Test Analysis Conference 2009. 2009
|
14 |
Li S, Gao S Y, Zhang Z Y, Li X S, Guan J Y, Xu W R, Guo J. PRIS at TAC 2009: experiments in KBP track. In: Proceedings of Test Analysis Conference 2009. 2009
|
15 |
McNamee P. HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the Text Analysis Conference. 2010
|
16 |
Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y, Cao Y, Lin C, Tan C L. I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the Text Analysis Conference. 2011
|
17 |
Han X P, Zhao J. Named entity disambiguation by leveraging wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 215–224
https://doi.org/10.1145/1645953.1645983
|
18 |
Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topicbased unsupervised name disambiguation. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. 2007, 342–351
|
19 |
Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 463–470
https://doi.org/10.1145/1060745.1060813
|
20 |
Han X P, Zhao J. Web personal name disambiguation based on reference entity tables mined from the Web. In: Proceedings of the 11th International Workshop on Web Information and Data Management. 2009, 75–82
https://doi.org/10.1145/1651587.1651605
|
21 |
Tang J T, Lu Q, Wang T, Wang J, Li W J. A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 1233–1234
https://doi.org/10.1145/2009916.2010135
|
22 |
Lang J, Qin B, Song W, Liu L, Liu T, Li S. Person name disambiguation of searching results using social network. Chinese Journal of Computers, 2009, 32(7): 1365–1374
https://doi.org/10.3724/SP.J.1016.2009.01365
|
23 |
Xu R F, Xu J, Dai X Y, Kit C. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010, 359
|
24 |
Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179–188
https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
|
25 |
Han Z H, Peng L, Sun X P. SIR-NERD: A Chinese named entity recog nition and disambiguation system using a two-stage method. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 115
|
26 |
Zong H, Wong D F, Chao L S. A template based hybrid model for Chinese personal name disambiguation. In: Proceedings of the 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing. 2012
|
27 |
Han W, Liu G, Mao Y Z, Huang Z N. Attribute based Chinese named entity recognition and disambiguation. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 127
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|