Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2018, Vol. 12 Issue (1) : 163-176    https://doi.org/10.1007/s11704-016-5560-0
RESEARCH ARTICLE
Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE
Ilyes KHENNAK(), Habiba DRIAS
Laboratory for Research in Artificial Intelligence, Computer Science Department, University of Sciences and Technology Houari Boumediene (USTHB), Algiers 16111, Algeria
 Download: PDF(354 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Because of users’ growing utilization of unclear and imprecise keywords when characterizing their information need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms occurring in the largest possible number of documents where the query keywords appear; (2) proximity, where more importance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria simultaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the retrieval performance as compared to the baseline.

Keywords information retrieval      query expansion      pseudorelevance feedback      proximity      multi-objective optimization      Pareto dominance      MEDLINE     
Corresponding Author(s): Ilyes KHENNAK   
Just Accepted Date: 28 September 2016   Online First Date: 08 December 2017    Issue Date: 12 January 2018
 Cite this article:   
Ilyes KHENNAK,Habiba DRIAS. Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE[J]. Front. Comput. Sci., 2018, 12(1): 163-176.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5560-0
https://academic.hep.com.cn/fcs/EN/Y2018/V12/I1/163
1 Ranganathan P. From microprocessors to nanostores: rethinking datacentric systems. IEEE Computer, 2011, 44(1): 39–48
https://doi.org/10.1109/MC.2011.18
2 Zhu Y Y, Zhong N, Xiong Y. Data explosion, data nature and dataology. In: Proceedings of International Conference on Brain Informatics. 2009, 147–158
https://doi.org/10.1007/978-3-642-04954-5_25
3 Ntoulas A, Cho J, Olston C. What’s new on the Web?: the evolution of the Web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 1–12
https://doi.org/10.1145/988672.988674
4 Bharat K, Broder A. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 1998, 30(1): 379–388
https://doi.org/10.1016/S0169-7552(98)00127-5
5 Williams H E, Zobel J. Searchable words on the Web. International Journal on Digital Libraries, 2005, 5(2): 99–105
https://doi.org/10.1007/s00799-003-0050-z
6 Eisenstein J, O’Connor B, Smith N A, Xing E P. Mapping the geographical diffusion of new words. In: Proceedings of Workshop on Social Network and Social Media Analysis: Methods, Models and Applications. 2012
7 Sun H M. A study of the features of internet english from the linguistic perspective. Studies in Literature and Language, 2010, 1(7): 98–103
8 Chen Q, Li M, Zhou M. Improving query spelling correction usingWeb search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007, 181–189
9 Subramaniam L V, Roy S, Faruquie T A, Negi S. A survey of types of text noise and techniques to handle noisy text. In: Proceedings of the 3rd Workshop on Analytics for Noisy Unstructured Text Data. 2009, 115–122
https://doi.org/10.1145/1568296.1568315
10 Ahmad F, Kondrak G. Learning a spelling error model from search query logs. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005, 955–962
https://doi.org/10.3115/1220575.1220695
11 Carpineto C, Romano G. A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 2012, 44(1): 1–50
https://doi.org/10.1145/2071389.2071390
12 Véronis J. Hyperlex: lexical cartography for information retrieval. Computer Speech & Language, 2004, 18(3): 223–252
https://doi.org/10.1016/j.csl.2004.05.002
13 Bernardini A, Carpineto C, Amico M D. Full-subtopic retrieval with keyphrase-based search results clustering. In: Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technologies. 2009, 206–213
https://doi.org/10.1109/WI-IAT.2009.37
14 Wong S K M, Ziarko W, Raghavan V V, Wong P. On modeling of information retrieval concepts in vector spaces. ACM Transactions on Database Systems, 1987, 12(2): 299–321
https://doi.org/10.1145/22952.22957
15 Crestani F. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 1997, 11(6): 453–482
https://doi.org/10.1023/A:1006569829653
16 Carpineto C, Romano G. Concept Data Analysis: Theory and Applications. Chichester: John Wiley & Sons, 2004
https://doi.org/10.1002/0470011297
17 Sahlgren M. An introduction to random indexing. In: Proceedings of Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering. 2005
18 Melucci M. A basis for information retrieval in context. ACM Transactions on Information Systems, 2008, 26(3): 1–41
https://doi.org/10.1145/1361684.1361687
19 Sun R, Ong C H, Chua T S. Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 382–389
https://doi.org/10.1145/1148170.1148237
20 Schlaefer N, Ko J, Betteridge J, Pathak M A, Nyberg E, Sautter G. Semantic extensions of the Ephyra QA system for TREC 2007. In: Proceedings of the 16th Text REtrieval Conference. 2007
21 Kraaij W, Nie J Y, Simard M. Embedding Web-based statistical translation models in cross-language information retrieval. Computational Linguistics, 2003, 29(3): 381–419
https://doi.org/10.1162/089120103322711587
22 Kherfi M L, Ziou D, Bernardi A. Image retrieval from the World Wide Web: issues, techniques, and systems. ACM Computing Surveys, 2004, 36(1): 35–67
https://doi.org/10.1145/1013208.1013210
23 Natsev A P, Haubold A, Tešić J, Xie L X, Yan R. Semantic conceptbased query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th ACM International Conference on Multimedia. 2007, 991–1000
24 Arguello J, Elsas J L, Callan J, Carbonell J G. Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media. 2008, 10–18
25 Hidalgo J M G, de Buenaga Rodríguez M, Pérez J C C. The role of word sense disambiguation in automated text categorization. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems. 2005, 298–309
26 Graupmann J, Cai J, Schenkel R. Automatic query refinement using mined semantic relations. In: Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration. 2005, 205–213
https://doi.org/10.1109/WIRI.2005.12
27 Kamvar M, Baluja S. The role of context in query input: using contextual signals to complete queries on mobile devices. In: Proceedings of the 9th International Conference on Human Computer Interaction with Mobile Devices and Services. 2007, 405–412
https://doi.org/10.1145/1377999.1378046
28 Huang C C, Lin K M, Chien L F. Automatic training corpora acquisition through Web mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technologies. 2005, 193–199
29 Perugini S, Ramakrishnan N. Interacting withWeb hierarchies. IT Professional, 2006, 8(4): 19–28
https://doi.org/10.1109/MITP.2006.91
30 Church K, Smyth B. Mobile content enrichment. In: Proceedings of the 12th International Conference on Intelligent User Interfaces. 2007, 112–121
https://doi.org/10.1145/1216295.1216320
31 Macdonald C, Ounis I. Expertise drift and query expansion in expert search. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 341–350
https://doi.org/10.1145/1321440.1321490
32 Billerbeck B, Zobel J. Document expansion versus query expansion for ad-hoc retrieval. In: Proceedings of the 10th Australasian Document Computing Symposium. 2005, 34–41
33 Shokouhi M, Azzopardi L, Thomas P. Effective query expansion for federated search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 427–434
https://doi.org/10.1145/1571941.1572015
34 Wang H, Liang Y, Fu L, Xue G R, Yu Y. Efficient query expansion for advertisement search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 51–58
https://doi.org/10.1145/1571941.1571953
35 Voorhees E M. Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1994, 61–69
https://doi.org/10.1007/978-1-4471-2099-5_7
36 Collins-Thompson K, Callan J. Query expansion using random walk models. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005, 704–711
https://doi.org/10.1145/1099554.1099727
37 Liu S, Liu F, Yu C, Meng W Y. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 266–272
https://doi.org/10.1145/1008992.1009039
38 Song M, Song I Y, Hu X H, Allen R B. Integration of association rules and ontologies for semantic query expansion. Data & Knowledge Engineering, 2007, 63(1): 63–75
https://doi.org/10.1016/j.datak.2006.10.010
39 Gauch S, Wang J Y, Rachakonda S M. A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems, 1999, 17(3): 250–269
https://doi.org/10.1145/314516.314519
40 Hu J N, Deng W H, Guo J. Improving retrieval performance by global analysis. In: Proceedings of the 18th International Conference on Pattern Recognition. 2006, 703–706
41 Park L A, Ramamohanarao K. Query expansion using a collection dependent probabilistic latent semantic thesaurus. In: Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2007, 224–235
https://doi.org/10.1007/978-3-540-71701-0_24
42 Milne D N, Witten I H, Nichols D M. A knowledge-based search engine powered by wikipedia. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 445–454
https://doi.org/10.1145/1321440.1321504
43 Rocchio J J. Relevance feedback in information retrieval. The SMART Retrieval System-Experiments in Automatic Document Processing, 1971, 313–323
44 Robertson S E, Jones K S. Relevance weighting of search terms. Journal of the American Society for Information Science, 1976, 27(3): 129–146
https://doi.org/10.1002/asi.4630270302
45 Wong W, Luk R W P, Leong H V, Ho K, Lee D L. Re-examining the effects of adding relevance information in a relevance feedback environment. Information Processing & Management, 2008, 44(3): 1086–1116
https://doi.org/10.1016/j.ipm.2007.12.002
46 Zhai C X, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410
https://doi.org/10.1145/502585.502654
47 Lavrenko V, Croft W B. Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001, 120–127
https://doi.org/10.1145/383952.383972
48 Khennak I, Drias H. Strength pareto fitness assignment for generating expansion features. In: Proceedings of the 3rd World Conference on Information Systems and Technologies. 2015, 133–142
https://doi.org/10.1007/978-3-319-16486-1_13
49 Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends® in Information Retrieval, 2009, 3(4): 333–389
50 Robertson S E. On term selection for query expansion. Journal of Documentation, 1990, 46(4): 359–364
https://doi.org/10.1108/eb026866
51 Carpineto C, De Mori R, Romano G, Bigi B. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 2001, 19(1): 1–27
https://doi.org/10.1145/366836.366860
52 Jurafsky D, Martin J H. Speech and Language Processing. Upper Saddle River, NJ: Pearson Prentice Hall, 2014
[1] Zhumin CHEN, Xueqi CHENG, Shoubin DONG, Zhicheng DOU, Jiafeng GUO, Xuanjing HUANG, Yanyan LAN, Chenliang LI, Ru LI, Tie-Yan LIU, Yiqun LIU, Jun MA, Bing QIN, Mingwen WANG, Jirong WEN, Jun XU, Min ZHANG, Peng ZHANG, Qi ZHANG. Information retrieval: a view from the Chinese IR community[J]. Front. Comput. Sci., 2021, 15(1): 151601-.
[2] Ying LI, Xiangwei KONG, Haiyan FU, Qi TIAN. Contextual modeling on auxiliary points for robust image reranking[J]. Front. Comput. Sci., 2019, 13(5): 1010-1022.
[3] Zhaoman ZHONG,Zongtian LIU,Yun HU,Cunhua LI. Efficient multi-event monitoring using built-in search engines[J]. Front. Comput. Sci., 2016, 10(2): 281-291.
[4] Yan ZHANG,Dunwei GONG. Generating test data for both paths coverage and faults detection using genetic algorithms: multi-path case[J]. Front. Comput. Sci., 2014, 8(5): 726-740.
[5] Dunwei GONG, Yan ZHANG. Generating test data for both path coverage and fault detection using genetic algorithms[J]. Front Comput Sci, 2013, 7(6): 822-837.
[6] Quanqing XU , Bin CUI , Yafei DAI , Hengtao SHEN , Zaiben CHEN , Xiaofang ZHOU , . Hybrid information retrieval policies based on cooperative cache in mobile P2P networks[J]. Front. Comput. Sci., 2009, 3(3): 381-395.
[7] Yaochu JIN, Robin GRUNA, Bernhard SENDHOFF. Pareto analysis evolutionary and learning systems[J]. Front Comput Sci Chin, 2009, 3(1): 4-17.
[8] Carlos A. COELLO COELLO. Evolutionary multi-objective optimization:some current research trends and topics that remain to be explored[J]. Front Comput Sci Chin, 2009, 3(1): 18-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed