|
|
A constraint-based topic modeling approach for
name disambiguation |
Feng WANG,Jie TANG,Juanzi LI,Kehong WANG, |
Department of Computer
Science and Technology, Tsinghua University, Beijing 100084, China; |
|
|
Abstract Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.
|
Keywords
name disambiguation
constraint
topic model
|
Issue Date: 05 March 2010
|
|
|
Han H, Giles L, Zha H, et al. Two supervised learning approaches for namedisambiguation in author citations. In:Proceedings of Joint Conference on Digital Libraries 2004. Tucson,Arizona, USA, June2004, 296–305
|
|
Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-wayspectral clustering method. In: Proceedingsof Joint Conference on Digital Libraries 2005. Denver, Colorado, USA, June 2005, 334–343
|
|
Han H, Xu W, Zha H Y, et al. A hierarchical naïve bayes mixture modelfor name disambiguation in author citations. In: Proceedings of the 20th Annual ACM Symposium on AppliedComputing, 2005
|
|
Tan Y F, Kan M, Lee D. Search engine driven author Disambiguation. In: Proceedings of Joint Conference on DigitalLibraries 2006. Chapel Hill, NC, USA, June 2006, 314–315
|
|
Yin X, Han J, Yu P S. Object distinction: distinguishing object with identicalnames. In: Proceedings of IEEE 23rd InternationalConference on Data Engineering, 2007, 1242–1246
|
|
Bekkerman R, McCallum A. Disambiguating Web appearancesof people in a social network. In: Proceedingsof the International World Wide Web Conference 2005. ACM Press, 2005, 463–470
|
|
Mann G, Yarowsky D. Unsupervised personal namedisambiguation. In: Proceedings of 7thConference on Computational Natural Language Learning. Edmonton, Canada, 2003, 33–40
|
|
Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email usinggraphs. In: Proceedings of the 29th AnnualInternational ACM SIGIR Conference on Research and Development inInformation Retrieval. USA, 2006, 27–34
|
|
Malin B, Airoldi E, Carley K. A network analysis model for disambiguation of namesin lists. Computational & MathematicalOrganization Theory, 2005, 11(2): 119–139
doi: 10.1007/s10588-005-3940-3
|
|
Malin B. Unsupervisedname disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security, 2005
|
|
Huang J, Ertekin S, Giles C. Efficient name disambiguation for large-scale databases. In: Proceeding of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Database, 2006
|
|
Huang J, Ertekin S, Giles C. Fast Author Name Disambiguation In CiteSeer. IST Technical Report No. 0019, the PennsylvaniaState University, 2006
|
|
Lee D, On B, Kang J, et al. Effective and scalable solutions for mixed andsplit citation problems in digital libraries. In: Proceedings of MIT Information Quality Industry Symposium, 2005, 69–76
|
|
Li X, Morie P, Roth D. Identification and tracing of ambiguous names: discriminativeand generative approaches.In: Proceedingsof the Conference of Association for the Advancement of ArtificialIntelligence, 2004, 419–424
|
|
Elmacioglu E, Tan Y, Yan S, et al. PSNUS: web people name disambiguation by simpleclustering with rich features. In: Proceedingsof the 4th International Workshop on Semantic Evaluation (SemEval), 2007
|
|
Chen Z, Kalashnikov DV, Mehrotra S. Adaptive graphical approach to entity resolution. In: Proceedings of Joint Cenference of DigitalLibraries, 2007, 204–213
|
|
On B, Lee D. Scalable name disambiguationusing multi-level graph partition. In:Society of Industrial and Applied Mathematics International Conferenceon Data Mining (SDM), 2007
|
|
Zhang D, Tang J, Li J, et al. Constraint-based probabilistic framework forname disambiguation. In: Proceedings ofACM Conference on Information and Knowledge Management, 2007, 1019–1022
|
|
Hofmann T. Probabilisticlatent semantic indexing. In: Proceedingsof Special Interest Group on Information Retrieval, 1999, 50–57
|
|
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, (3): 993–1022
doi: 10.1162/jmlr.2003.3.4-5.993
|
|
Steyvers M, Smyth P, Griffiths T, et al. Probabilistic author-topic models for informationdiscovery. In: Proceedings of the 10thACM International Conference on Knowledge Discovery and Data Mining, 2004, 306–315
|
|
Song Y, Huang J, Councill I G, et al. Generative models for name disambiguation. In: Proceedings of the World Wide Web Conference, 2007
|
|
Griffiths T, Steyvers M. Finding scientific topics, In: Proceedings of the National Academy of Sciences, 2004, 101(suppl 1): 5228–5235
doi: 10.1073/pnas.0307752101
|
|
Heinrich G. Parameterestimation for text analysis. TechnicalReport, 2008
|
|
Mei Q, Cai D, Zhang D, et al. Topic modeling with network regularization. In: Proceedings of the World Wide Web Conference, 2008
|
|
Cohn D, Caruana R, McCallum A. Semi-supervised Clustering with User Feedback. Technical Report TR2003-1892, Cornell University, 2003
|
|
On B, Lee D, Kang J, et al. Comparative study of name disambiguation problemusing a scalable blocking-based framework. In: Proceedings of Joint Conference on Digital Libraries, 2005, 344–353
|
|
Tang J, Zhang J, Yao L, et al. ArnetMiner: extraction and mining of academicsocial networks. In: Proceedings of KnowledgeDiscovery and Data Mining Conference, 2008, 990–998
|
|
Cai D, He X, Han J. Spectral Regression for Dimensionality Reduction. Department of Computer Science Technical ReportNo. 2856, University of Illinois at Urbana-Champaign. 2007
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|