ProSyno: context-free prompt learning for synonym discovery

doi:10.1007/s11704-024-3900-z

Front. Comput. Sci.

2025, Vol. 19

Issue (6) : 196317 https://doi.org/10.1007/s11704-024-3900-z

Artificial Intelligence

ProSyno: context-free prompt learning for synonym discovery

Song ZHANG^1,², Lei HE³, Dong WANG³, Hongyun BAO¹, Suncong ZHENG³, Yuqiao LIU^1,², Baihua XIAO¹, Jiayue LI⁵(

), Dongyuan LU⁴(

), Nan ZHENG^1,²(

)

¹. State Key Laboratory of Multimodal Artifcial Intelligence Systems, Institute of Automation,Chinese Academy of Sciences, Beijing 100190, China
². School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China
³. Tencent AI Platform Department, Beijing 100048, China
⁴. University of International Business and Economics, Beijing 100029, China
⁵. Beijing Academy of Blockchain and Edge Computing, Beijing 100085, China

Download: PDF(13131 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Synonym discovery is important in a wide variety of concept-related tasks, such as entity/concept mining and industrial knowledge graph (KG) construction. It intends to determine whether two terms refer to the same concept in semantics. Existing methods rely on contexts or KGs. However, these methods are often impractical in some cases where contexts or KGs are not available. Therefore, this paper proposes a context-free prompt learning based synonym discovery method called ProSyno, which takes the world’s largest freely available dictionary Wiktionary as a semantic source. Based on a pre-trained language model (PLM), we employ a prompt learning method to generalize to other datasets without any fine-tuning. Thus, our model is more appropriate for context-free situation and can be easily transferred to other fields. Experimental results demonstrate its superiority comparing with state-of-the-art methods.

Keywords synonym discovery prompt learning large language model

Corresponding Author(s): Jiayue LI,Dongyuan LU,Nan ZHENG

Just Accepted Date: 13 May 2024 Issue Date: 03 July 2024

Cite this article:

Song ZHANG,Lei HE,Dong WANG, et al. ProSyno: context-free prompt learning for synonym discovery[J]. Front. Comput. Sci., 2025, 19(6): 196317.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-3900-z
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I6/196317

Fig.1 A sample which shows that a word description in Wiktionary helps to distinguish synonym

Fig.2 (a) Overview of ProSyno; (b) hierarchical semantic encoder

Tab.1 Comparisons of ProSyno against state-of-the-art performances (%)

Tab.2 Ablation study (%)

Tab.3 Case studies of a sampled term (“anxiety disorders”) on the effect of the dynamic matching mechanism. Word description weights of the term for the positive candidate (“zero stress tolerance”) and the negative candidate (“cardiac arrhythmia”) are shown

Tab.4 Further analysis (%)

Tab.5 Performance of generalization (%)

1	X, Luo L, Bo J, Wu L, Li Z, Luo Y, Yang K Yang . AliCoCo2: commonsense knowledge extraction, representation and application in E-commerce. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3385−3393
2	M, Li Y, Xing F, Kong G Zhou . Towards better entity linking. Frontiers of Computer Science, 2022, 16( 2): 162308
3	M, Zhang T, He M Dong . Meta-path reasoning of knowledge graph for commonsense question answering. Frontiers of Computer Science, 2024, 18( 1): 181303
4	D, Xu T Miller . A simple neural vector space model for medical concept normalization using concept embeddings. Journal of Biomedical Informatics, 2022, 130: 104080
5	Zhang C, Li Y, Du N, Fan W, Yu P S. Entity synonym discovery via multipiece bilateral context matching. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 199
6	S, Pei L, Yu X Zhang . Set-aware entity synonym discovery with flexible receptive fields. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 1): 891–904
7	Z, Yuan Z, Zhao H, Sun J, Li F, Wang S Yu . CODER: knowledge-infused cross-lingual medical term embedding for term normalization. Journal of Biomedical Informatics, 2022, 126: 103983
8	Garcia M. Exploring the representation of word meanings in context: a case study on homonymy and synonymy. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 3625−3640
9	Z, Miftahutdinov E Tutubalina . Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019, 393−399
10	Z, Wang X, Yue S, Moosavinasab Y, Huang S, Lin H Sun . SurfCon: synonym discovery on privacy-aware clinical data. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 1578−1586
11	Y, Gao X, Wang X, He H, Feng Y Zhang . Rumor detection with self-supervised learning on texts and social graph. Frontiers of Computer Science, 2023, 17( 4): 174611
12	N, Zhang Q, Jia S, Deng X, Chen H, Ye H, Chen H, Tou G, Huang Z, Wang N, Hua H Chen . AliCG: fine-grained and evolvable conceptual graph construction for semantic search at Alibaba. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3895−3905
13	T, Xie B, Wu B, Jia B Wang . Graph-ranking collective Chinese entity linking algorithm. Frontiers of Computer Science, 2020, 14( 2): 291–303
14	Wang C, He X, Zhou A. A short survey on taxonomy learning from text corpora: Issues, resources and recent advances. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 1190−1203
15	Zhang J, Trujillo L B, Li T, Tanwar A, Freire G, Yang X, Ive J, Gupta V, Guo Y. Self-supervised detection of contextual synonyms in a multi-class setting: Phenotype annotation use case. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8754−8769
16	T, Zhang Z, Cai C, Wang M, Qiu B, Yang X He . SMedBERT: a knowledge-enhanced pre-trained language model with structured semantics for medical text mining. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 5882−5893
17	Y, Yang X, Yin H, Yang X, Fei H, Peng K, Zhou K, Lai J Shen . KGSynNet: a novel entity synonyms discovery framework with knowledge graph. In: Proceedings of the 26th International Conference. 2021, 174−190
18	C, Wang M, Qiu J, Huang X He . KEML: a knowledge-enriched meta-learning framework for lexical relation classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13924−13932
19	J, Shen R, Lyu X, Ren M, Vanni B, Sadler J Han . Mining entity synonyms with efficient neural set generation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 249−256
20	A, Radford J, Wu R, Child D, Luan D, Amodei I Sutskever . Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9
21	Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171−4186
22	J, Zeng Z, Wang Y, Yu J, Wen M Gao . Word embedding methods in natural language processing: a review. Journal of Frontiers of Computer Science and Technology, 2024, 18( 1): 24–43
23	P, Liu W, Yuan J, Fu Z, Jiang H, Hayashi G Neubig . Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55( 9): 195
24	X L, Li P Liang . Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582−4597
25	Zhong Z, Friedman D, Chen D. Factual probing is [MASK]: learning vs. learning to recall. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 5017−5033
26	Izbicki M. Aligning word vectors on low-resource languages with wiktionary. In: Proceedings of the 5th Workshop on Technologies for Machine Translation of Low-Resource Languages. 2022, 107−117
27	L, Bajčetić T Declerck . Using wiktionary to create specialized lexical resources and datasets. In: Proceedings of the 13th Conference on Language Resources and Evaluation. 2022
28	Y, Fang S, Wang Y, Xu R, Xu S, Sun C, Zhu M Zeng . Leveraging knowledge in multilingual commonsense reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics. 2022, 3237−3246
29	A, Vaswani N, Shazeer N, Parmar J, Uszkoreit L, Jones A N, Gomez Ł, Kaiser I Polosukhin . Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5998−6008
30	G A Miller . WordNet: a lexical database for English. Communications of the ACM, 1995, 38( 11): 39–41
31	N, Limsopatham N Collier . Normalising medical concepts in social media texts by learning semantic representation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1014−1023
32	E, Tutubalina Z, Miftahutdinov S, Nikolenko V Malykh . Medical concept normalization in social media posts with recurrent neural networks. Journal of Biomedical Informatics, 2018, 84: 93–102
33	D, Xu Z, Zhang S Bethard . A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8452−8464
34	J, Lee W, Yoon S, Kim D, Kim S, Kim C H, So J Kang . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36( 4): 1234–1240
35	Z, Xie N Zeng . A mixture-of-experts model for antonym-synonym discrimination. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 558−564
36	P, Bojanowski E, Grave A, Joulin T Mikolov . Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135–146

[1]

FCS-23900-OF-SZ_suppl_1

Download

[1]	Lei WANG, Chen MA, Xueyang FENG, Zeyu ZHANG, Hao YANG, Jingsen ZHANG, Zhiyuan CHEN, Jiakai TANG, Xu CHEN, Yankai LIN, Wayne Xin ZHAO, Zhewei WEI, Jirong WEN. A survey on large language model based autonomous agents[J]. Front. Comput. Sci., 2024, 18(6): 186345-.
[2]	Yuting YANG, Pei HUANG, Juan CAO, Jintao LI, Yun LIN, Feifei MA. A prompt-based approach to adversarial example generation and robustness enhancement[J]. Front. Comput. Sci., 2024, 18(4): 184318-.

Viewed

Full text

Abstract

Cited

Shared

Discussed