Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (6) : 196317    https://doi.org/10.1007/s11704-024-3900-z
Artificial Intelligence
ProSyno: context-free prompt learning for synonym discovery
Song ZHANG1,2, Lei HE3, Dong WANG3, Hongyun BAO1, Suncong ZHENG3, Yuqiao LIU1,2, Baihua XIAO1, Jiayue LI5(), Dongyuan LU4(), Nan ZHENG1,2()
1. State Key Laboratory of Multimodal Artifcial Intelligence Systems, Institute of Automation,Chinese Academy of Sciences, Beijing 100190, China
2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China
3. Tencent AI Platform Department, Beijing 100048, China
4. University of International Business and Economics, Beijing 100029, China
5. Beijing Academy of Blockchain and Edge Computing, Beijing 100085, China
 Download: PDF(13131 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Synonym discovery is important in a wide variety of concept-related tasks, such as entity/concept mining and industrial knowledge graph (KG) construction. It intends to determine whether two terms refer to the same concept in semantics. Existing methods rely on contexts or KGs. However, these methods are often impractical in some cases where contexts or KGs are not available. Therefore, this paper proposes a context-free prompt learning based synonym discovery method called ProSyno, which takes the world’s largest freely available dictionary Wiktionary as a semantic source. Based on a pre-trained language model (PLM), we employ a prompt learning method to generalize to other datasets without any fine-tuning. Thus, our model is more appropriate for context-free situation and can be easily transferred to other fields. Experimental results demonstrate its superiority comparing with state-of-the-art methods.

Keywords synonym discovery      prompt learning      large language model     
Corresponding Author(s): Jiayue LI,Dongyuan LU,Nan ZHENG   
Just Accepted Date: 13 May 2024   Issue Date: 03 July 2024
 Cite this article:   
Song ZHANG,Lei HE,Dong WANG, et al. ProSyno: context-free prompt learning for synonym discovery[J]. Front. Comput. Sci., 2025, 19(6): 196317.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-3900-z
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I6/196317
Fig.1  A sample which shows that a word description in Wiktionary helps to distinguish synonym
Fig.2  (a) Overview of ProSyno; (b) hierarchical semantic encoder
Datasets AAP TwADR-L CADEC ANV
WordCNN81.4144.78??
WordGRU85.71???
BERT87.4647.02??
BioBERT88.3948.32??
CODER??59.01?
MoE-ASD87.3047.6558.2492.16
ProSyno90.2251.4960.5594.43
Tab.1  Comparisons of ProSyno against state-of-the-art performances (%)
Datasets AAP TwADR-L
ProSyno-P 85.96 48.13
ProSyno-SP 86.35 49.10
ProSyno-TP 88.19 50.32
ProSyno-RTP 88.45 51.22
ProSyno-CAT 87.35 49.30
ProSyno-MLP 86.03 46.95
ProSyno-BL 88.92 50.21
ProSyno-MP 89.24 51.35
ProSyno 90.22 51.49
Tab.2  Ablation study (%)
Word Word descriptions Positive Negative
“zero stress tolerance” “cardiac arrhythmia”
Anxiety An unpleasant state of mental uneasiness, nervousness, apprehension and obsession or concern about some uncertain event. 0.6632 0.3521
An uneasy or distressing desire (for something). 0.2034 0.3215
A state of restlessness and agitation, often accompanied by a distressing sense of oppression or tightness in the stomach. 0.1334 0.3264
Disorders Absence of order; state of not being arranged in an orderly manner. 0.2180 0.3422
A disturbance of civic peace or of public order. 0.2397 0.3600
A physical or mental malfunction. 0.5423 0.2978
Tab.3  Case studies of a sampled term (“anxiety disorders”) on the effect of the dynamic matching mechanism. Word description weights of the term for the positive candidate (“zero stress tolerance”) and the negative candidate (“cardiac arrhythmia”) are shown
DatasetsAAPTwADR-L
ProSyno-RAND89.3551.22
ProSyno-BERT88.4547.23
ProSyno-FT89.9551.30
ProSyno90.2251.49
Tab.4  Further analysis (%)
Datasets TwADR-L CADEC ANV
WordCNN44.78??
WordGRU???
BERT47.02??
BioBERT48.32??
CODER?59.01?
MoE-ASD47.6558.2492.16
ProSyno-AA50.0358.9293.57
Tab.5  Performance of generalization (%)
  
  
  
  
  
  
  
  
  
  
1 X, Luo L, Bo J, Wu L, Li Z, Luo Y, Yang K Yang . AliCoCo2: commonsense knowledge extraction, representation and application in E-commerce. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3385−3393
2 M, Li Y, Xing F, Kong G Zhou . Towards better entity linking. Frontiers of Computer Science, 2022, 16( 2): 162308
3 M, Zhang T, He M Dong . Meta-path reasoning of knowledge graph for commonsense question answering. Frontiers of Computer Science, 2024, 18( 1): 181303
4 D, Xu T Miller . A simple neural vector space model for medical concept normalization using concept embeddings. Journal of Biomedical Informatics, 2022, 130: 104080
5 Zhang C, Li Y, Du N, Fan W, Yu P S. Entity synonym discovery via multipiece bilateral context matching. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 199
6 S, Pei L, Yu X Zhang . Set-aware entity synonym discovery with flexible receptive fields. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 1): 891–904
7 Z, Yuan Z, Zhao H, Sun J, Li F, Wang S Yu . CODER: knowledge-infused cross-lingual medical term embedding for term normalization. Journal of Biomedical Informatics, 2022, 126: 103983
8 Garcia M. Exploring the representation of word meanings in context: a case study on homonymy and synonymy. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 3625−3640
9 Z, Miftahutdinov E Tutubalina . Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019, 393−399
10 Z, Wang X, Yue S, Moosavinasab Y, Huang S, Lin H Sun . SurfCon: synonym discovery on privacy-aware clinical data. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 1578−1586
11 Y, Gao X, Wang X, He H, Feng Y Zhang . Rumor detection with self-supervised learning on texts and social graph. Frontiers of Computer Science, 2023, 17( 4): 174611
12 N, Zhang Q, Jia S, Deng X, Chen H, Ye H, Chen H, Tou G, Huang Z, Wang N, Hua H Chen . AliCG: fine-grained and evolvable conceptual graph construction for semantic search at Alibaba. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 3895−3905
13 T, Xie B, Wu B, Jia B Wang . Graph-ranking collective Chinese entity linking algorithm. Frontiers of Computer Science, 2020, 14( 2): 291–303
14 Wang C, He X, Zhou A. A short survey on taxonomy learning from text corpora: Issues, resources and recent advances. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 1190−1203
15 Zhang J, Trujillo L B, Li T, Tanwar A, Freire G, Yang X, Ive J, Gupta V, Guo Y. Self-supervised detection of contextual synonyms in a multi-class setting: Phenotype annotation use case. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8754−8769
16 T, Zhang Z, Cai C, Wang M, Qiu B, Yang X He . SMedBERT: a knowledge-enhanced pre-trained language model with structured semantics for medical text mining. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 5882−5893
17 Y, Yang X, Yin H, Yang X, Fei H, Peng K, Zhou K, Lai J Shen . KGSynNet: a novel entity synonyms discovery framework with knowledge graph. In: Proceedings of the 26th International Conference. 2021, 174−190
18 C, Wang M, Qiu J, Huang X He . KEML: a knowledge-enriched meta-learning framework for lexical relation classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13924−13932
19 J, Shen R, Lyu X, Ren M, Vanni B, Sadler J Han . Mining entity synonyms with efficient neural set generation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 249−256
20 A, Radford J, Wu R, Child D, Luan D, Amodei I Sutskever . Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9
21 Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171−4186
22 J, Zeng Z, Wang Y, Yu J, Wen M Gao . Word embedding methods in natural language processing: a review. Journal of Frontiers of Computer Science and Technology, 2024, 18( 1): 24–43
23 P, Liu W, Yuan J, Fu Z, Jiang H, Hayashi G Neubig . Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55( 9): 195
24 X L, Li P Liang . Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582−4597
25 Zhong Z, Friedman D, Chen D. Factual probing is [MASK]: learning vs. learning to recall. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 5017−5033
26 Izbicki M. Aligning word vectors on low-resource languages with wiktionary. In: Proceedings of the 5th Workshop on Technologies for Machine Translation of Low-Resource Languages. 2022, 107−117
27 L, Bajčetić T Declerck . Using wiktionary to create specialized lexical resources and datasets. In: Proceedings of the 13th Conference on Language Resources and Evaluation. 2022
28 Y, Fang S, Wang Y, Xu R, Xu S, Sun C, Zhu M Zeng . Leveraging knowledge in multilingual commonsense reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics. 2022, 3237−3246
29 A, Vaswani N, Shazeer N, Parmar J, Uszkoreit L, Jones A N, Gomez Ł, Kaiser I Polosukhin . Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5998−6008
30 G A Miller . WordNet: a lexical database for English. Communications of the ACM, 1995, 38( 11): 39–41
31 N, Limsopatham N Collier . Normalising medical concepts in social media texts by learning semantic representation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1014−1023
32 E, Tutubalina Z, Miftahutdinov S, Nikolenko V Malykh . Medical concept normalization in social media posts with recurrent neural networks. Journal of Biomedical Informatics, 2018, 84: 93–102
33 D, Xu Z, Zhang S Bethard . A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8452−8464
34 J, Lee W, Yoon S, Kim D, Kim S, Kim C H, So J Kang . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36( 4): 1234–1240
35 Z, Xie N Zeng . A mixture-of-experts model for antonym-synonym discrimination. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 558−564
36 P, Bojanowski E, Grave A, Joulin T Mikolov . Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135–146
[1] FCS-23900-OF-SZ_suppl_1 Download
[1] Lei WANG, Chen MA, Xueyang FENG, Zeyu ZHANG, Hao YANG, Jingsen ZHANG, Zhiyuan CHEN, Jiakai TANG, Xu CHEN, Yankai LIN, Wayne Xin ZHAO, Zhewei WEI, Jirong WEN. A survey on large language model based autonomous agents[J]. Front. Comput. Sci., 2024, 18(6): 186345-.
[2] Yuting YANG, Pei HUANG, Juan CAO, Jintao LI, Yun LIN, Feifei MA. A prompt-based approach to adversarial example generation and robustness enhancement[J]. Front. Comput. Sci., 2024, 18(4): 184318-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed