Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (5) : 155327    https://doi.org/10.1007/s11704-020-0002-4
REVIEW ARTICLE
Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases
Fanchao QI1,2,3, Ruobing XIE4, Yuan ZANG1,2,3, Zhiyuan LIU1,2,3,5(), Maosong SUN1,2,3,5
1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
2. Institute for Artificial Intelligence, Tsinghua University, Beijing 100084, China
3. Beijing National Research Center for Information Science and Technology, Beijing 100084, China
4. Search Product Center, WeChat Search Application Department, Tencent, Beijing 100080, China
5. Beijing Academy of Artificial Intelligence, Beijing 100191, China
 Download: PDF(526 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.

Keywords natural language process      semantics      knowledge base      sememe      HowNet     
Corresponding Author(s): Zhiyuan LIU   
Just Accepted Date: 16 July 2020   Issue Date: 10 May 2021
 Cite this article:   
Fanchao QI,Ruobing XIE,Yuan ZANG, et al. Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases[J]. Front. Comput. Sci., 2021, 15(5): 155327.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0002-4
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I5/155327
1 L Bloomfield. A set of postulates for the science of language. Language, 1926, 2(3): 153–164
https://doi.org/10.2307/408741
2 A Wierzbicka. Semantics: Primes and Universals. Oxford: Oxford University Press, 1996
3 Z Dong, Q Dong. HowNet and the Computation of Meaning. Singapore: World Scientific Publishing, 2006
https://doi.org/10.1142/5935
4 K W Gan, P W Wong. Annotating information structures in Chinese texts using HowNet. In: Proceedings of the 2nd Chinese Language Processing Workshop. 2000, 85–92
https://doi.org/10.3115/1117769.1117784
5 Q Liu, S Li. Word similarity computing based on HowNet. International Journal of Computational Linguistics & Chinese Language Processing, 2002, 7(2): 59–76
6 Y Zhang, L Gong, Y Wang. Chinese word sense disambiguation using HowNet. In: Proceedings of International Conference on Natural Computation. 2005, 925–932
https://doi.org/10.1007/11539087_123
7 X Duan, J Zhao, B Xu. Word sense disambiguation through sememe labeling. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, 1594–1599
8 Y Zhu, J Min, Y Zhou, X Huang, L Wu. Semantic orientation computing based on HowNet. Journal of Chinese Information Processing, 2006, 20(1): 14–20
9 L Dang, L Zhang. Method of discriminant for Chinese sentence sentiment orientation based on HowNet. Application Research of Computers, 2010, 4: 43
10 X Fu, G Liu, Y Guo, Z Wang. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Knowledge-Based Systems, 2013, 37: 186–195
https://doi.org/10.1016/j.knosys.2012.08.003
11 J Sun, D Cai, D Lv, Y Dong. HowNet based Chinese question automatic classification. Journal of Chinese Information Processing, 2007, 21(1): 90–95
12 A Moro, A Raganato, R Navigli. Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2014, 2: 231–244
https://doi.org/10.1162/tacl_a_00179
13 M Faruqui, J Dodge, S K Jauhar, C Dyer, E Hovy, N A Smith. Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015, 1606–1615
https://doi.org/10.3115/v1/N15-1184
14 Q Chen, X Zhu, Z H Ling, D Inkpen, S Wei. Neural natural language inference models enhanced with external knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2406–2417
https://doi.org/10.18653/v1/P18-1224
15 M Sun, X Chen. Embedding for words and word senses based on human annotated knowledge base: use HowNet as a case study. Journal of Chinese Information Processing, 2016, 30(6): 1–5
16 Y Niu, R Xie, Z Liu, M Sun. Improved word representation learning with sememes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 2049–2058
https://doi.org/10.18653/v1/P17-1187
17 Y Gu, J Yan, H Zhu, Z Liu, R Xie, M Sun, F Lin, L Lin. Language modeling with sparse product of sememe experts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 4642–4651
https://doi.org/10.18653/v1/D18-1493
18 X Zeng, C Yang, C Tu, Z Liu, M Sun. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 5650–5657
19 F Qi, J Huang, C Yang, Z Liu, X Chen, Q Liu, M Sun. Modeling semantic compositionality with sememe knowledge. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 5706–5715
https://doi.org/10.18653/v1/P19-1571
20 Y Qin, F Qi, S Ouyang, Z Liu, C Yang, Y Wang, Q Liu, M Sun. Enhancing recurrent neural networks with sememes. 2019, arXiv preprint arXiv:1910.08910
21 L Luo, X Ao, Y Song, J Li, X Yang, Q He, D Yu. Unsupervised neural aspect extraction with sememes. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 5123–5129
https://doi.org/10.24963/ijcai.2019/712
22 L Zhang, F Qi, Z Liu, Y Wang, Q Liu, M Sun. Multi-channel reverse dictionary model. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 312–319
https://doi.org/10.1609/aaai.v34i01.5365
23 Y Zang, F Qi, C Yang, Z Liu, M Zhang, Q Liu, M Sun. Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6066–6080
https://doi.org/10.18653/v1/2020.acl-main.540
24 R Xie, X Yuan, Z Liu, M Sun. Lexical sememe prediction via word embeddings and matrix factorization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 4200–4206
https://doi.org/10.24963/ijcai.2017/587
25 B Sarwar, G Karypis, J Konstan, J Riedl. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web. 2001, 285–295
https://doi.org/10.1145/371920.372071
26 Y Koren, R Bell, C Volinsky. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30–37
https://doi.org/10.1109/MC.2009.263
27 H Jin, H Zhu, Z Liu, R Xie, M Sun, F Lin, L Lin. Incorporating Chinese characters of words for lexical sememe prediction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2439–2449
https://doi.org/10.18653/v1/P18-1227
28 J Du, F Qi, M Sun, Z Liu. Lexical sememe prediction by dictionary definitions and local semantic correspondence. Journal of Chinese Information Processing, 2020, 34(5): 1–9
29 F Qi, Y Lin, M Sun, H Zhu, R Xie, Z Liu. Cross-lingual lexical sememe prediction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 358–368
https://doi.org/10.18653/v1/D18-1033
30 F Qi, L Chang, M Sun, O Sicong, Z Liu. Towards building a multilingual sememe knowledge base: predicting sememes for BabelNet synsets. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 8624–8631
https://doi.org/10.1609/aaai.v34i05.6386
31 G A Miller. WordNet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41
https://doi.org/10.1145/219717.219748
32 R Speer, J Chin, C Havasi. Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4444–4451
33 T Mikolov, K Chen, G Corrado, J Dean. Efficient estimation of word representations in vector space. In: Proceedings of 2013 International Conference on Learning Representations Workshop. 2013
34 S Hochreiter, J Schmidhuber. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
https://doi.org/10.1162/neco.1997.9.8.1735
35 G Hinton. Products of experts. In: Proceedings of the 9th International Conference on Artificial Neural Networks. 1999, 1–6
https://doi.org/10.1049/cp:19991075
36 F J Pelletier. The principle of semantic compositionality. Topoi, 1994, 13(1): 11–24
https://doi.org/10.1007/BF00763644
37 F J Pelletier. Semantic compositionality. In: Oxford Research Encyclopedia of Linguistics. Oxford University Press, 2016
https://doi.org/10.1093/acrefore/9780199384655.013.42
38 J Mitchell, M Lapata. Language models based on semantic composition. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 430–439
https://doi.org/10.3115/1699510.1699567
39 R Socher, J Bauer, C D Manning, A Y Ng. Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 455–465
40 A L Maas, R E Daly, P T Pham, D Huang, A Y Ng, C Potts. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142–150
41 R Socher, A Perelygin, J Y Wu, J Chuang, C D Manning, A Y Ng, C Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013, 1631–1642
42 J Mitchell, M Lapata. Vector-based models of semantic composition. In: Proceedings of ACL-08: HLT. 2008, 236–244
43 R Navigli, S P Ponzetto. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 2012, 193: 217–250
https://doi.org/10.1016/j.artint.2012.07.001
44 X Chen, L Xu, Z Liu, M Sun, H Luan. Joint learning of character and word embeddings. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015, 1236–1242
45 J Camacho-Collados, M T Pilehvar, R Navigli. Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 2016, 240: 36–64
https://doi.org/10.1016/j.artint.2016.07.005
46 A Bordes, N Usunier, A Garcia-Duran, J Weston, O Yakhnenko. Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th Conference on Neural Information Processing Systems. 2013, 2787–2795
47 F Qi, C Yang, Z Liu, Q Dong, M Sun, Z Dong. OpenHowNet: an open sememe-based lexical knowledge base. 2019, arXiv preprint arXiv:1901.09957
[1] Article highlights Download
[1] Mingyang LI, Yuqing XING, Fang KONG, Guodong ZHOU. Towards better entity linking[J]. Front. Comput. Sci., 2022, 16(2): 162308-.
[2] Dongjin YU, Lin WANG, Xin CHEN, Jie CHEN. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt[J]. Front. Comput. Sci., 2021, 15(4): 154208-.
[3] Lydia LAZIB, Bing QIN, Yanyan ZHAO, Weinan ZHANG, Ting LIU. A syntactic path-based hybrid neural network for negation scope detection[J]. Front. Comput. Sci., 2020, 14(1): 84-94.
[4] Xin CHEN, He JIANG, Zhenyu CHEN, Tieke HE, Liming NIE. Automatic test report augmentation to assist crowdsourced testing[J]. Front. Comput. Sci., 2019, 13(5): 943-959.
[5] Thierry GAUTIER, Clément GUY, Alexandre HONORAT, Paul LE GUERNIC, Jean-Pierre TALPIN, Loïc BESNARD. Polychronous automata and their use for formal validation of AADL models[J]. Front. Comput. Sci., 2019, 13(4): 677-697.
[6] Ying JIANG, Shichao LIU, Thomas EHRHARD. A fully abstract semantics for value-passing CCS for trees[J]. Front. Comput. Sci., 2019, 13(4): 828-849.
[7] Qingying SUN, Zhongqing WANG, Shoushan LI, Qiaoming ZHU, Guodong ZHOU. Stance detection via sentiment information and neural network model[J]. Front. Comput. Sci., 2019, 13(1): 127-138.
[8] Zhongqing WANG, Shoushan LI, Guodong ZHOU. Personal summarization from profile networks[J]. Front. Comput. Sci., 2017, 11(6): 1085-1097.
[9] Yang-Yen OU, Ta-Wen KUAN, Anand PAUL, Jhing-Fa WANG, An-Chao TSAI. Spoken dialog summarization system with HAPPINESS/SUFFERING factor recognition[J]. Front. Comput. Sci., 2017, 11(3): 429-443.
[10] Xiao PAN,Weizhang CHEN,Lei WU,Chunhui PIAO,Zhaojun HU. Protecting personalized privacy against sensitivity homogeneity attacks over road networks in mobile services[J]. Front. Comput. Sci., 2016, 10(2): 370-386.
[11] Yang ZHANG,Xinyu FENG. An operational happens-before memory model[J]. Front. Comput. Sci., 2016, 10(1): 54-81.
[12] Yanhong HUANG,Jifeng HE,Huibiao ZHU,Yongxin ZHAO,Jianqi SHI,Shengchao QIN. Semantic theories of programs with nested interrupts[J]. Front. Comput. Sci., 2015, 9(3): 331-345.
[13] Xiaoxiao YANG,Yu ZHANG,Ming FU,Xinyu FENG. A temporal programming model with atomic blocks based on projection temporal logic[J]. Front. Comput. Sci., 2014, 8(6): 958-976.
[14] Qin LI,Yongxin ZHAO,Huibiao ZHU,Jifeng HE. A UTP semantic model for Orc language with execution status and fault handling[J]. Front. Comput. Sci., 2014, 8(5): 709-725.
[15] Zengchang QIN, Tao WAN. Hybrid Bayesian estimation tree learning with discrete and fuzzy labels[J]. Front Comput Sci, 2013, 7(6): 852-863.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed