Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2010, Vol. 4 Issue (1) : 100-111    https://doi.org/10.1007/s11704-009-0064-9
Research articles
A constraint-based topic modeling approach for name disambiguation
Feng WANG,Jie TANG,Juanzi LI,Kehong WANG,
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
 Download: PDF(316 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.
Keywords name disambiguation      constraint      topic model      
Issue Date: 05 March 2010
 Cite this article:   
Feng WANG,Juanzi LI,Jie TANG, et al. A constraint-based topic modeling approach for name disambiguation[J]. Front. Comput. Sci., 2010, 4(1): 100-111.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-009-0064-9
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I1/100
Han H, Giles L, Zha H, et al. Two supervised learning approaches for namedisambiguation in author citations. In:Proceedings of Joint Conference on Digital Libraries 2004. Tucson,Arizona, USA, June2004, 296–305
Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-wayspectral clustering method. In: Proceedingsof Joint Conference on Digital Libraries 2005. Denver, Colorado, USA, June 2005, 334–343
Han H, Xu W, Zha H Y, et al. A hierarchical naïve bayes mixture modelfor name disambiguation in author citations. In: Proceedings of€ the 20th Annual ACM Symposium on AppliedComputing, 2005
Tan Y F, Kan M, Lee D. Search engine driven author Disambiguation. In: Proceedings of Joint Conference on DigitalLibraries 2006. Chapel Hill, NC, USA, June 2006, 314–315
Yin X, Han J, Yu P S. Object distinction: distinguishing object with identicalnames. In: Proceedings of IEEE 23rd InternationalConference on Data Engineering, 2007, 1242–1246
Bekkerman R, McCallum A. Disambiguating Web appearancesof people in a social network. In: Proceedingsof the International World Wide Web Conference 2005. ACM Press, 2005, 463–470
Mann G, Yarowsky D. Unsupervised personal namedisambiguation. In: Proceedings of 7thConference on Computational Natural Language Learning. Edmonton, Canada, 2003, 33–40
Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email usinggraphs. In: Proceedings of the 29th AnnualInternational ACM SIGIR Conference on Research and Development inInformation Retrieval. USA, 2006, 27–34
Malin B, Airoldi E, Carley K. A network analysis model for disambiguation of namesin lists. Computational & MathematicalOrganization Theory, 2005, 11(2): 119–139

doi: 10.1007/s10588-005-3940-3
Malin B. Unsupervisedname disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security, 2005
Huang J, Ertekin S, Giles C. Efficient name disambiguation for large-scale databases. In: Proceeding of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Database, 2006
Huang J, Ertekin S, Giles C. Fast Author Name Disambiguation In CiteSeer. IST Technical Report No. 0019, the PennsylvaniaState University, 2006
Lee D, On B, Kang J, et al. Effective and scalable solutions for mixed andsplit citation problems in digital libraries. In: Proceedings of MIT Information Quality Industry Symposium, 2005, 69–76
Li X, Morie P, Roth D. Identification and tracing of ambiguous names: discriminativeand generative approaches.In: Proceedingsof the Conference of Association for the Advancement of ArtificialIntelligence, 2004, 419–424
Elmacioglu E, Tan Y, Yan S, et al. PSNUS: web people name disambiguation by simpleclustering with rich features. In: Proceedingsof the 4th International Workshop on Semantic Evaluation (SemEval), 2007
Chen Z, Kalashnikov DV, Mehrotra S. Adaptive graphical approach to entity resolution. In: Proceedings of Joint Cenference of DigitalLibraries, 2007, 204–213
On B, Lee D. Scalable name disambiguationusing multi-level graph partition. In:Society of Industrial and Applied Mathematics International Conferenceon Data Mining (SDM), 2007
Zhang D, Tang J, Li J, et al. Constraint-based probabilistic framework forname disambiguation. In: Proceedings ofACM Conference on Information and Knowledge Management, 2007, 1019–1022
Hofmann T. Probabilisticlatent semantic indexing. In: Proceedingsof Special Interest Group on Information Retrieval, 1999, 50–57
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, (3): 993–1022

doi: 10.1162/jmlr.2003.3.4-5.993
Steyvers M, Smyth P, Griffiths T, et al. Probabilistic author-topic models for informationdiscovery. In: Proceedings of the 10thACM International Conference on Knowledge Discovery and Data Mining, 2004, 306–315
Song Y, Huang J, Councill I G, et al. Generative models for name disambiguation. In: Proceedings of the World Wide Web Conference, 2007
Griffiths T, Steyvers M. Finding scientific topics, In: Proceedings of the National Academy of Sciences, 2004, 101(suppl 1): 5228–5235

doi: 10.1073/pnas.0307752101
Heinrich G. Parameterestimation for text analysis. TechnicalReport, 2008
Mei Q, Cai D, Zhang D, et al. Topic modeling with network regularization. In: Proceedings of the World Wide Web Conference, 2008
Cohn D, Caruana R, McCallum A. Semi-supervised Clustering with User Feedback. Technical Report TR2003-1892, Cornell University, 2003
On B, Lee D, Kang J, et al. Comparative study of name disambiguation problemusing a scalable blocking-based framework. In: Proceedings of Joint Conference on Digital Libraries, 2005, 344–353
Tang J, Zhang J, Yao L, et al. ArnetMiner: extraction and mining of academicsocial networks. In: Proceedings of KnowledgeDiscovery and Data Mining Conference, 2008, 990–998
Cai D, He X, Han J. Spectral Regression for Dimensionality Reduction. Department of Computer Science Technical ReportNo. 2856, University of Illinois at Urbana-Champaign. 2007
[1] Ebuka IBEKE, Chenghua LIN, Adam WYNER, Mohamad Hardyman BARAWI. A unified latent variable model for contrastive opinion mining[J]. Front. Comput. Sci., 2020, 14(2): 404-416.
[2] Yuanrui ZHANG, Frédéric MALLET, Yixiang CHEN. A verification framework for spatio-temporal consistency language with CCSL as a specification language[J]. Front. Comput. Sci., 2020, 14(1): 105-129.
[3] Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[4] Houkui ZHOU, Huimin YU, Roland HU. Topic evolution based on the probabilistic topic model: a review[J]. Front. Comput. Sci., 2017, 11(5): 786-802.
[5] Jian-Hao LUO,Wang ZHOU,Jianxin WU. Image categorization with resource constraints: introduction, challenges and advances[J]. Front. Comput. Sci., 2017, 11(1): 13-26.
[6] Ruifeng XU,Lin GUI,Qin LU,Shuai WANG,Jian XU. Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation[J]. Front. Comput. Sci., 2016, 10(6): 1026-1038.
[7] Lamia SADEG-BELKACEM,Zineb HABBAS,Wassila AGGOUNE-MTALAA. Adaptive genetic algorithms guided by decomposition for PCSPs: application to frequency assignment problems[J]. Front. Comput. Sci., 2016, 10(6): 1012-1025.
[8] Dakun LIU,Xiaoyang TAN. Max-margin non-negative matrix factorization with flexible spatial constraints based on factor analysis[J]. Front. Comput. Sci., 2016, 10(2): 302-316.
[9] Yue WANG,Hongzhi WANG,Jianzhong LI,Hong GAO. Efficient graph similarity join for information integration on graphs[J]. Front. Comput. Sci., 2016, 10(2): 317-329.
[10] Dantong OUYANG, Xianji CUI, Yuxin YE. Integrity constraints in OWL ontologies based on grounded circumscription[J]. Front Comput Sci, 2013, 7(6): 812-821.
[11] Yun YE, Shengrong GONG, Chunping LIU, Jia ZENG, Ning JIA, Yi ZHANG. Online belief propagation algorithm for probabilistic latent semantic analysis[J]. Front Comput Sci, 2013, 7(4): 526-535.
[12] Shuyi ZHU, Xiaochun CAO, Handong ZHAO. Photographic composite detection using circles[J]. Front Comput Sci, 2012, 6(6): 741-755.
[13] Jaffer GARDEZI, Leopoldo BERTOSSI, Iluju KIRINGA. Matching dependencies: semantics and query answering[J]. Front Comput Sci, 2012, 6(3): 278-292.
[14] Xudong ZHU, Zhijing LIU. Human behavior clustering for anomaly detection[J]. Front Comput Sci Chin, 2011, 5(3): 279-289.
[15] Xuesong YIN, Enliang HU. Distance metric learning guided adaptive subspace semi-supervised clustering[J]. Front Comput Sci Chin, 2011, 5(1): 100-108.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed