Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2016, Vol. 10 Issue (3) : 477-487    https://doi.org/10.1007/s11704-015-5222-7
RESEARCH ARTICLE
Optimizing top-k retrieval: submodularity analysis and search strategies
Chaofeng SHA1,*(),Keqiang WANG2,Dell ZHANG3,Xiaoling WANG2,Aoying ZHOU2
1. School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China
2. Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
3. Department of Computer Science and Information Systems, Birkbeck, University of London, LondonWC1E 7HX, UK
 Download: PDF(403 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1-1e for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.

Keywords top-k retrieval      diversification      submodular function maximization     
Corresponding Author(s): Chaofeng SHA   
Online First Date: 06 April 2016    Issue Date: 16 May 2016
 Cite this article:   
Chaofeng SHA,Keqiang WANG,Dell ZHANG, et al. Optimizing top-k retrieval: submodularity analysis and search strategies[J]. Front. Comput. Sci., 2016, 10(3): 477-487.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-015-5222-7
https://academic.hep.com.cn/fcs/EN/Y2016/V10/I3/477
1 Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
2 Chen H, Karger D R. Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 429–436
3 Carbonell J G, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336
4 Zhai C, Cohen W W, Lafferty J D. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annal International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003, 10–17
5 Wang J, Zhu J. Portfolio theory of information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 115–122
6 Zuccon G, Azzopardi L. Using the quantum probability ranking principle to rank interdependent documents. In: Proceedings of the 32th European Conference on Information Retrieval Research. 2010, 357–369
7 Chandar P, Carterette B. Diversification of search results using webgraphs. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 869–870
8 Santos R L T, Macdonald C, Ounis I. Intent-aware search result diversification. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 595–604
9 Zuccon G, Azzopardi L, Zhang D,Wang J. Top-k retrieval using facility location analysis. In: Proceedings of the 34th European Conference on Information Retrieval Research. 2012, 305–316
10 Gonzalez T F. Handbook of Approximation Algorithms and Metaheuristics. Boca Raton: CRC Press , 2007
11 Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 2009
12 Sha C,Wang K, Zhang D,Wang X, Zhou A. Optimizing top-k retrieval: submodularity analysis and search strategies. In: Proceedings of the 15th International Conference on Web-Age Information Management. 2014, 18–29
13 Nemhauser G, Wolsey L, Fisher M. An analysis of approximations for maximizing submodular set functions —I. Mathematical Programming, 1978, 14(1): 265–294
14 Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying searchresults. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009, 5–14
15 He J, Hollink V, de Vries A P. Combining implicit and explicit topic representations for result diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 851–860
16 Santos R L T, Macdonald C, Ounis I. Exploiting query reformulations for Web search result diversification. In: Proceedings of the 19th International World Wide Web Conference. 2010, 881–890
17 Vallet D, Castells P. Personalized diversification of search results. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 841–850
18 Vargas S, Castells P, Vallet D. Explicit relevance models in intentoriented information retrieval diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 75–84
19 Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 381–390
20 Krause A, Golovin D. Submodular function maximization. Tractability: Practical Approaches to Hard Problems, 2012, 3: 19
21 Lin H, Bilmes J. A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 510–520
22 Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 621–630
23 Clarke C L A, Kolla M, Cormack G V, Vechtomova O, Ashkan A, Buttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 659–666
24 Krause A, Guestrin C. Near-optimal nonmyopic value of information in graphical models. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 2005, 324–331
25 Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 137–146
[1]  Supplementary Material Download
[1] Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing[J]. Front. Comput. Sci., 2020, 14(4): 144607-.
[2] Wanyu CHEN, Fei CAI, Honghui CHEN, Maarten DE RIJKE. Personalized query suggestion diversification in information retrieval[J]. Front. Comput. Sci., 2020, 14(3): 143602-.
[3] Rong ZHANG,Wenzhe YU,Chaofeng SHA,Xiaofeng HE,Aoying ZHOU. Product-oriented review summarization and scoring[J]. Front. Comput. Sci., 2015, 9(2): 210-223.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed