Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (5) : 1048-1061    https://doi.org/10.1007/s11704-018-7056-6
RESEARCH ARTICLE
Patent expanded retrieval via word embedding under composite-domain perspectives
Fei WANG1,2, Tieyun QIAN1,2(), Bin LIU1,2, Zhiyong PENG1,2()
1. School of Computer Science, Wuhan University, Wuhan 430072, China
2. State Key Lab of Software Engineering, Wuhan University, Wuhan 430072, China
 Download: PDF(598 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansionwith double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.

Keywords patent retrieval      composite-domain perspective      double-consistency expansion      word embedding     
Corresponding Author(s): Tieyun QIAN,Zhiyong PENG   
Just Accepted Date: 25 September 2017   Online First Date: 22 October 2018    Issue Date: 25 June 2019
 Cite this article:   
Fei WANG,Tieyun QIAN,Bin LIU, et al. Patent expanded retrieval via word embedding under composite-domain perspectives[J]. Front. Comput. Sci., 2019, 13(5): 1048-1061.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7056-6
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I5/1048
1 L Zhang, L Li, T Li. Patent mining: a survey. ACM SIGKDD Explorations Newsletter, 2015, 16(2): 1–19
https://doi.org/10.1145/2783702.2783704
2 X Xue, W B Croft. Automatic query generation for patent search. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 2037–2040
https://doi.org/10.1145/1645953.1646295
3 X Xue, W B Croft. Transforming patents into prior-art queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 808–809
https://doi.org/10.1145/1571941.1572139
4 Y Kim, J Seo, W B Croft. Automatic boolean query suggestion for professional search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 825–834
https://doi.org/10.1145/2009916.2010026
5 Y Kim, W B Croft. Diversifying query suggestions based on query documents. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 891–894
https://doi.org/10.1145/2600428.2609467
6 M G Far, S Sanner, M R Bouadjenek, G Ferraro, D Hawking. On term selection techniques for patent prior art search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 803–806
7 B Al-Shboul, S H Myaeng. Query phrase expansion using wikipedia in patent class search. In: Proceedings of the 7th Asia Information Retrieval Symposium. 2011, 115–126
https://doi.org/10.1007/978-3-642-25631-8_11
8 W Magdy, G J F Jones. A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval. 2011, 19–24
https://doi.org/10.1145/2064975.2064982
9 K Kishida. Pseudo relevance feedback method based on taylor expansion of retrieval function in NTCIR-3 patent retrieval task. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing. 2003, 33–40
https://doi.org/10.3115/1119303.1119307
10 P Mahdabi, L Andersson, M Keikha, F Crestani. Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 505–514
https://doi.org/10.1145/2348283.2348353
11 P Mahdabi, S Gerani, J X Huang, F Crestani. Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 113–122
https://doi.org/10.1145/2484028.2484056
12 F Wang, L Lin. Domain lexicon-based query expansion for patent retrieval. In: Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 2016, 1543–1547
https://doi.org/10.1109/FSKD.2016.7603405
13 P Mahdabi, F Crestani. Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1659–1668
https://doi.org/10.1145/2661829.2661899
14 A Judea, H Schütze, S Brügmann. Unsupervised training set generation for automatic acquisition of technical terminology in patents. In: Proceedings of the 15th International Conference on Computational Linguistics. 2014, 290–300
15 W Magdy, J Leveling, G J F Jones. Exploring structured documents and query formulation techniques for patent retrieval. In: Proceedings of the Workshop on Cross-Language Evaluation Forum for European Languages. 2009, 410–417
16 P Mahdabi, F Crestani. Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Transactions on Information Systems, 2014, 32(4): 1–30
https://doi.org/10.1145/2651363
17 S Cetintas, L Si. Effective query generation and postprocessing strategies for prior art patent search. Journal of the Association for Information Science and Technology, 2012, 63(3): 512–527
https://doi.org/10.1002/asi.21708
18 D Ganguly, J Leveling, W Magdy, G J F Jones. Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1953–1956
https://doi.org/10.1145/2063576.2063863
19 T Mikolov, K Chen, G Corrado, J Dean. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781
20 W Magdy, G J F Jones. PRES: a score metric for evaluating recalloriented information retrieval applications. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 611–618
https://doi.org/10.1145/1835449.1835551
[1] Tao XIE, Bin WU, Bingjing JIA, Bai WANG. Graph-ranking collective Chinese entity linking algorithm[J]. Front. Comput. Sci., 2020, 14(2): 291-303.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed