|
|
Optimizing top-k retrieval: submodularity analysis and search strategies |
Chaofeng SHA1,*( ),Keqiang WANG2,Dell ZHANG3,Xiaoling WANG2,Aoying ZHOU2 |
1. School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China 2. Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China 3. Department of Computer Science and Information Systems, Birkbeck, University of London, LondonWC1E 7HX, UK |
|
|
Abstract The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1-1e for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.
|
Keywords
top-k retrieval
diversification
submodular function maximization
|
Corresponding Author(s):
Chaofeng SHA
|
Online First Date: 06 April 2016
Issue Date: 16 May 2016
|
|
1 |
Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
|
2 |
Chen H, Karger D R. Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 429–436
|
3 |
Carbonell J G, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336
|
4 |
Zhai C, Cohen W W, Lafferty J D. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annal International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003, 10–17
|
5 |
Wang J, Zhu J. Portfolio theory of information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 115–122
|
6 |
Zuccon G, Azzopardi L. Using the quantum probability ranking principle to rank interdependent documents. In: Proceedings of the 32th European Conference on Information Retrieval Research. 2010, 357–369
|
7 |
Chandar P, Carterette B. Diversification of search results using webgraphs. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 869–870
|
8 |
Santos R L T, Macdonald C, Ounis I. Intent-aware search result diversification. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 595–604
|
9 |
Zuccon G, Azzopardi L, Zhang D,Wang J. Top-k retrieval using facility location analysis. In: Proceedings of the 34th European Conference on Information Retrieval Research. 2012, 305–316
|
10 |
Gonzalez T F. Handbook of Approximation Algorithms and Metaheuristics. Boca Raton: CRC Press , 2007
|
11 |
Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 2009
|
12 |
Sha C,Wang K, Zhang D,Wang X, Zhou A. Optimizing top-k retrieval: submodularity analysis and search strategies. In: Proceedings of the 15th International Conference on Web-Age Information Management. 2014, 18–29
|
13 |
Nemhauser G, Wolsey L, Fisher M. An analysis of approximations for maximizing submodular set functions —I. Mathematical Programming, 1978, 14(1): 265–294
|
14 |
Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying searchresults. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009, 5–14
|
15 |
He J, Hollink V, de Vries A P. Combining implicit and explicit topic representations for result diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 851–860
|
16 |
Santos R L T, Macdonald C, Ounis I. Exploiting query reformulations for Web search result diversification. In: Proceedings of the 19th International World Wide Web Conference. 2010, 881–890
|
17 |
Vallet D, Castells P. Personalized diversification of search results. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 841–850
|
18 |
Vargas S, Castells P, Vallet D. Explicit relevance models in intentoriented information retrieval diversification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 75–84
|
19 |
Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 381–390
|
20 |
Krause A, Golovin D. Submodular function maximization. Tractability: Practical Approaches to Hard Problems, 2012, 3: 19
|
21 |
Lin H, Bilmes J. A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 510–520
|
22 |
Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 621–630
|
23 |
Clarke C L A, Kolla M, Cormack G V, Vechtomova O, Ashkan A, Buttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 659–666
|
24 |
Krause A, Guestrin C. Near-optimal nonmyopic value of information in graphical models. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 2005, 324–331
|
25 |
Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 137–146
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|