Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (4) : 174609    https://doi.org/10.1007/s11704-022-2041-5
LETTER
A novel dense retrieval framework for long document retrieval
Jiajia WANG1,2,3, Weizhong ZHAO2,3, Xinhui TU2,3(), Tingting HE2,3()
1. School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
2. Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning,Central China Normal University, Wuhan 430079, China
3. National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
 Download: PDF(644 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Corresponding Author(s): Xinhui TU,Tingting HE   
Just Accepted Date: 19 August 2022   Issue Date: 03 November 2022
 Cite this article:   
Jiajia WANG,Weizhong ZHAO,Xinhui TU, et al. A novel dense retrieval framework for long document retrieval[J]. Front. Comput. Sci., 2023, 17(4): 174609.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2041-5
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I4/174609
Fig.1  The schematic representation of DRSCM
Model GOV2 Robust04
P@10 P@20 NDCG@10 NDCG@20 Latency/ms P@10 P@20 NDCG@10 NDCG@20 Latency/ms
Bag-of-words
BM25(Anserini) 0.5775 0.5381 0.4856 0.4784 132 0.4382 0.3631 0.4485 0.4240 88
BERT-based Models
Birch (MS MARCO) [5] ? ? ? ? ? 0.4578 0.3964 0.4645 0.4512 290,000
Vinilla_BERT [3] 0.5666 0.5483 0.4714 0.4670 129,963 0.4633 0.4050 0.4750 0.4685 58,400
CEDR_KNRM [3] 0.5746 0.5437 0.4618 0.4626 475,940 0.4936 0.4175 0.5101 0.4832 146,000
Aggregation Method “2sum”
DRSCM(SBERT) 0.6919 0.6335 0.5810 0.5637 55 0.5008 0.4098 0.5157 0.4816 34
DRSCM(RepBERT) 0.6924 0.6300 0.5832 0.5620 28 0.5008 0.4237 0.5192 0.4941 11
Aggregation Method “3sum”
DRSCM(SBERT) 0.6918 0.6284 0.5814 0.5606 55 0.5064 0.4205 0.5238 0.4919 34
DRSCM(RepBERT) 0.6870 0.6300 0.5828 0.5633 28 0.4972 0.4133 0.5140 0.4836 11
Aggregation Method “Max”
DRSCM(SBERT) 0.6898 0.6270 0.5878 0.5653 55 0.5016 0.4088 0.5177 0.4821 34
DRSCM(RepBERT) 0.6924 0.6290 0.5840 0.5625 28 0.4980 0.4229 0.5118 0.4897 11
Aggregation Method “Mean”
DRSCM(SBERT) 0.6233 0.5807 0.5377 0.5211 55 0.4731 0.3932 0.4873 0.4601 34
DRSCM(RepBERT) 0.6313 0.5828 0.5408 0.5228 28 0.4707 0.3974 0.4809 0.4594 11
Tab.1  The comparison results of DRSCM with two different dense retrieval models and four aggregation strategies on GOV2 and Robust04
1 J, Devlin M W, Chang K, Lee K Toutanova . Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171–4186
2 O, Khattab M Zaharia . ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020, 39–48
3 N, Reimers I Gurevych . Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 3982–3992
4 J, Zhan J, Mao Y, Liu M, Zhang S Ma . RepBERT: contextualized text embeddings for first-stage retrieval. 2020, arXiv preprint arXiv: 2006.15498
5 L, Zhu Y Yang . ActBERT: learning global-local video-text representations. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8743–8752
6 Z A, Yilmaz S, Wang W, Yang H, Zhang J Lin . Applying BERT to document retrieval with birch. In: Proceedings of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 19–24
7 M, Li E Gaussier . KeyBLD: selecting key blocks with local pre-ranking for long document information retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021, 2207–2211
[1] FCS-22041-OF-JW_suppl_1 Download
[2] FCS-22041-OF-JW_suppl_2 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed