Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (1) : 151307    https://doi.org/10.1007/s11704-020-9240-8
RESEARCH ARTICLE
Entity set expansion in knowledge graph: a heterogeneous information network perspective
Chuan SHI1, Jiayu DING1, Xiaohuan CAO1, Linmei HU1(), Bin WU1, Xiaoli LI2
1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore 138632, Singapore
 Download: PDF(896 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Entity set expansion (ESE) aims to expand an entity seed set to obtain more entities which have common properties. ESE is important for many applications such as dictionary construction and query suggestion. Traditional ESE methods relied heavily on the text and Web information of entities. Recently, some ESE methods employed knowledge graphs (KGs) to extend entities. However, they failed to effectively and efficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia. In this paper, we model a KG as a heterogeneous information network (HIN) containing multiple types of objects and relations. Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities. Then we rank the entities according to the meta path based structural similarity. Furthermore, to utilize the text description of entities in Wikipedia, we propose an extended model CoMeSE++ which combines both structural information revealed by a KG and text information in Wikipedia for ESE. Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.

Keywords entity set expansion      knowledge graph      heterogeneous information network      multi-type meta path     
Corresponding Author(s): Linmei HU   
Just Accepted Date: 27 December 2019   Issue Date: 24 September 2020
 Cite this article:   
Chuan SHI,Jiayu DING,Xiaohuan CAO, et al. Entity set expansion in knowledge graph: a heterogeneous information network perspective[J]. Front. Comput. Sci., 2021, 15(1): 151307.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-9240-8
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I1/151307
1 W Cohen, S Sarawagi. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 89–98
https://doi.org/10.1145/1014052.1014065
2 P Pantel, D Lin. Discovering word senses from text. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613–619
https://doi.org/10.1145/775047.775138
3 J Hu, G Wang, F Lochovsky, J T Sun, Z Chen. Understanding user’s query intent with wikipedia. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 471–480
4 H Cao, D Jiang, J Pei, Q He, Z Liao, E Chen, H Li. Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 875–883
https://doi.org/10.1145/1401890.1401995
5 Y Y He, D Xin. Seisa: set expansion by iterative similarity aggregation. In: Proceedings of the 20th International Conference onWorldWideWeb. 2011, 427–436
https://doi.org/10.1145/1963405.1963467
6 R C Wang, W W Cohen. Language-independent set expansion of named entities using theWeb. In: Proceedings of the 7th IEEE International Conference on Data Mining. 2007, 342–350
https://doi.org/10.1109/ICDM.2007.104
7 L Sarmento, V Jijkuon, R M De, E Oliveira. More like these: growing entity classes from seeds. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007, 959–962
https://doi.org/10.1145/1321440.1321585
8 X L Li, L Zhang, B Liu, S K Ng. Distributional similarity vs. PU learning for entity set expansion. In: Proceedings of the ACL 2010 Conference Short Papers. 2010, 359–364
9 Z Qi, K Liu, J Zhao. Choosing better seeds for entity set expansion by leveraging wikipedia semantic knowledge. In: Proceedings of the Chinese Conference on Pattern Recognition. 2012, 655–662
https://doi.org/10.1007/978-3-642-33506-8_80
10 Z Qi, K Liu, J Zhao. A novel entity set expansion method leveraging entity semantic knowledge. Journal of Chinese Information Processing, 2013, 27(2): 1–9
11 Y Zheng, C Shi, X Cao, X Li, B Wu. Entity set expansion with meta path in knowledge graph. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2017, 317–329
https://doi.org/10.1007/978-3-319-57454-7_25
12 Y Zheng, C Shi, X Cao, X Li, B Wu. A meta path based method for entity set expansion in knowledge graph. IEEE Transactions on Big Data, 2018
https://doi.org/10.1109/TBDATA.2018.2805366
13 C Shi, Y Li, J Zhang, Y Sun, S Y Philip. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2016, 29(1): 17–37
https://doi.org/10.1109/TKDE.2016.2598561
14 Y Sun, J Han, X Yan, Yu, S Philip, T Wu. Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 2011, 4(11): 992–1003
https://doi.org/10.14778/3402707.3402736
15 S Auer, C Bizer, G Kobilarov, J Lehmann, R Cyganiak, Z Ives. Dbpedia: a nucleus for a web of open data. In: Aberer K, et al. eds. The Semantic Web. Springer, Berlin, Heidelberg, 2007, 722–735
https://doi.org/10.1007/978-3-540-76298-0_52
16 X Cao, C Shi, Y Zheng, J Ding, i X L, B Wu. A heterogeneous information network method for entity set expansion in knowledge graph. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2018, 288–299
https://doi.org/10.1007/978-3-319-93037-4_23
17 R C Wang, W W Cohen. Iterative set expansion of named entities using the web. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 1091–1096
https://doi.org/10.1109/ICDM.2008.145
18 B Shi, Z Zhang, L Sun, X Han. A probabilistic co-bootstrapping method for entity set expansion. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2014, 2280–2290
19 Z Zhang, L Sun, X Han. A joint model for entity set expansion and attribute extraction from web search queries. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 3101–3107
20 J Shen, Z Wu, D Lei, J Shang, X Ren, J Han. Setexpan: corpus-based set expansion via context feature selection and rank ensemble. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2017, 288–304
https://doi.org/10.1007/978-3-319-71249-9_18
21 A Krishnan, D Padmanabhan, S Ranu, S Mehta. Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: Proceedings of the International Conference on Web Information Systems Engineering. 2016, 157–173
https://doi.org/10.1007/978-3-319-48740-3_11
22 L Bing, W Lam, T L Wong. Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013, 567–576
https://doi.org/10.1145/2433396.2433468
23 K Sadamitsu, K Saito, K Imamura, G Kikui. Entity set expansion using topic information. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 726–731
24 P Jindal, D Roth. Learning from negative examples in set-expansion. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 1110–1115
https://doi.org/10.1109/ICDM.2011.86
25 X Yu, Y Sun, B Norick, T Mao, J Han. User guided entity similarity search using meta-path selection in heterogeneous information networks. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 2025–2029
https://doi.org/10.1145/2396761.2398565
26 S Metzger, R Schenkel, M Sydow. QBEES: query by entity examples. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1829–1832
https://doi.org/10.1145/2505515.2507873
27 S Metzger, R Schenkel, M Sydow. Aspect-based similar entity search in semantic knowledge graphs with diversity-awareness and relaxation. In: Proceedings of the 2014 IEEE/WIC/ACMInternational Joint Conferences onWeb Intelligence (WI) and Intelligent Agent Technologies (IAT). 2014, 60–69
https://doi.org/10.1109/WI-IAT.2014.17
28 B Fetahu, U Gadiraju, S Dietze. Improving entity retrieval on structured data. In: Proceedings of International Semantic Web Conference. 2015, 474–491
https://doi.org/10.1007/978-3-319-25007-6_28
29 D Ma, , Y Chen, K C Chang, X Du, C Xu, Y Chang. Leveraging finegrained wikipedia categories for entity search. In: Proceedings of the 2018 World Wide Web Conference. 2018, 1623–1632
https://doi.org/10.1145/3178876.3186074
30 J Han. Mining heterogeneous information networks: the next frontier. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 2–3
https://doi.org/10.1145/2339530.2339533
31 Y Sun, B Norick, J Han, X Yan, P S Yu, X Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data, 2012, 7(3): 11
https://doi.org/10.1145/2500492
32 A Singhal. Introducing the knowledge graph: things, not strings. Official Google Blog, 2012
33 F M Suchanek, G Kasneci, G Weikum. Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 697–706
https://doi.org/10.1145/1242572.1242667
34 N Lao, W W Cohen. Relational retrieval using a combination of pathconstrained random walks. Machine Learning, 2010, 81(1): 53–67
https://doi.org/10.1007/s10994-010-5205-8
35 C Shi, X Kong, Y Huang, S Y Philip, B Wu. Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479–2492
https://doi.org/10.1109/TKDE.2013.2297920
36 E Charles, N Keith. Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 213–220
37 B Perozzi, R Al-Rfou, S Skiena. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 701–710
https://doi.org/10.1145/2623330.2623732
38 C Wang, Y Song, H Li, M Zhang, J Han. KnowSim: a document similarity measure on structured heterogeneous information networks. In: Proceedings of IEEE International Conference on Data Mining. 2015, 1015–1020
https://doi.org/10.1109/ICDM.2015.131
[1] Article highlights Download
[1] Lu LIU, Shang WANG. Meta-path-based outlier detection in heterogeneous information network[J]. Front. Comput. Sci., 2020, 14(2): 388-403.
[2] Jihong YAN, Chengyu WANG, Wenliang CHENG, Ming GAO, Aoying ZHOU. A retrospective of knowledge graphs[J]. Front. Comput. Sci., 2018, 12(1): 55-74.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed