基于依存关系和多义词分析的句法词嵌入

doi:10.1631/FITEE.1601846

Frontiers of Information Technology & Electronic Engineering

2018, Vol. 19

Issue (4): 524-535 https://doi.org/10.1631/FITEE.1601846

本期目录

基于依存关系和多义词分析的句法词嵌入

冶忠林¹, 赵海兴^1,²(

)

¹. 陕西师范大学计算机科学学院，中国西安市，710119
². 长安大学信息工程学院，中国西安市，710064

Syntactic word embedding based on dependency syntax and polysemous analysis

Zhong-lin YE¹, Hai-xing ZHAO^1,²(

)

¹. School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
². School of Computer, Qinghai Normal University, Xining 810800, China

全文: PDF(653 KB)

摘要:

现有大多数词嵌入学习模型存在以下问题：（1）基于词袋上下文的模型完全忽略句子的句法结构关系；（2）每个词使用单个嵌入向量使多义词共享一个嵌入向量；（3）词嵌入往往趋向于句子上下文共性。为解决这些问题，提出一种基于依存关系和多义词分析的句法词嵌入（syntactic word embedding, SWE）。该算法主要处理：（1）基于主题模型，提出一个多义词识别算法；（2）采用符号“+”和“−”表示依存关系方向；（3）删除停用词及其依存关系；（4）引入“skip”依存关系表示依存关系之间的间接关系；（5）将基于依存关系的上下文输入到Word2Vec模型中训练语言模型。实验结果表明，SWE模型在词相似度评测任务中表现出优异性能。基于依存关系句法上下文捕获词语的语义和句法特征，使词语表现出较少的上下文主题相似性和更多的句法和语义相似性。综上，包含更多信息的SWE模型性能优于单一的词嵌入学习模型。

Abstract：

Most word embedding models have the following problems: (1) In the models based on bag-of-words contexts, the structural relations of sentences are completely neglected; (2) Each word uses a single embedding, which makes the model indiscriminative for polysemous words; (3) Word embedding easily tends to contextual structure similarity of sentences. To solve these problems, we propose an easy-to-use representation algorithm of syntactic word embedding (SWE). The main procedures are: (1) A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation (LDA) algorithm; (2) Symbols ‘+’ and ‘−’ are adopted to indicate the directions of the dependency syntax; (3) Stopwords and their dependencies are deleted; (4) Dependency skip is applied to connect indirect dependencies; (5) Dependency-based contexts are inputted to a word2vec model. Experimental results show that our model generates desirable word embedding in similarity evaluation tasks. Besides, semantic and syntactic features can be captured from dependency-based syntactic contexts, exhibiting less topical and more syntactic similarity. We conclude that SWE outperforms single embedding learning models.

Key words： Dependency-based context Polysemous word representation Representation learning Syntactic word embedding

收稿日期: 2016-12-21 出版日期: 2018-06-28

通讯作者: 赵海兴 E-mail: h.x.zhao@163.com

Corresponding Author(s): Hai-xing ZHAO

引用本文:

冶忠林, 赵海兴. 基于依存关系和多义词分析的句法词嵌入[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(4): 524-535.
Zhong-lin YE, Hai-xing ZHAO. Syntactic word embedding based on dependency syntax and polysemous analysis. Front. Inform. Technol. Electron. Eng, 2018, 19(4): 524-535.

链接本文:

https://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1601846
https://academic.hep.com.cn/fitee/CN/Y2018/V19/I4/524

[1]		Download
[2]		Download

Viewed

Full text

Abstract

Cited

Shared

Discussed