A novel dense retrieval framework for long document retrieval

doi:10.1007/s11704-022-2041-5

Front. Comput. Sci.

2023, Vol. 17

Issue (4) : 174609 https://doi.org/10.1007/s11704-022-2041-5

LETTER

A novel dense retrieval framework for long document retrieval

Jiajia WANG^1,^2,³, Weizhong ZHAO^2,³, Xinhui TU^2,³(

), Tingting HE^2,³(

)

¹. School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
². Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning,Central China Normal University, Wuhan 430079, China
³. National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China

Download: PDF(644 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Corresponding Author(s): Xinhui TU,Tingting HE

Just Accepted Date: 19 August 2022 Issue Date: 03 November 2022

Cite this article:

Jiajia WANG,Weizhong ZHAO,Xinhui TU, et al. A novel dense retrieval framework for long document retrieval[J]. Front. Comput. Sci., 2023, 17(4): 174609.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2041-5
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I4/174609

Fig.1 The schematic representation of DRSCM

$M o d e l$	$G O V 2$					$R o b u s t 04$
$M o d e l$	P@10	P@20	NDCG@10	NDCG@20	Latency/ms	P@10	P@20	NDCG@10	NDCG@20	Latency/ms
Bag-of-words
$B M 25 (A n s e r i n i)$	0.5775	0.5381	0.4856	0.4784	132	0.4382	0.3631	0.4485	0.4240	88
BERT-based Models
Birch (MS MARCO) [5]	?	?	?	?	?	0.4578	0.3964	0.4645	0.4512	290,000
Vinilla_BERT [3]	0.5666	0.5483	0.4714	0.4670	129,963	0.4633	0.4050	0.4750	0.4685	58,400
CEDR_KNRM [3]	0.5746	0.5437	0.4618	0.4626	475,940	0.4936	0.4175	0.5101	0.4832	146,000
Aggregation Method “2sum”
$D R S C M (S B E R T)$	0.6919	0.6335	0.5810	0.5637	55	0.5008	0.4098	0.5157	0.4816	34
$D R S C M (R e p B E R T)$	0.6924	0.6300	0.5832	0.5620	28	0.5008	0.4237	0.5192	0.4941	11
Aggregation Method “3sum”
$D R S C M (S B E R T)$	0.6918	0.6284	0.5814	0.5606	55	0.5064	0.4205	0.5238	0.4919	34
$D R S C M (R e p B E R T)$	0.6870	0.6300	0.5828	0.5633	28	0.4972	0.4133	0.5140	0.4836	11
Aggregation Method “Max”
$D R S C M (S B E R T)$	0.6898	0.6270	0.5878	0.5653	55	0.5016	0.4088	0.5177	0.4821	34
$D R S C M (R e p B E R T)$	0.6924	0.6290	0.5840	0.5625	28	0.4980	0.4229	0.5118	0.4897	11
Aggregation Method “Mean”
$D R S C M (S B E R T)$	0.6233	0.5807	0.5377	0.5211	55	0.4731	0.3932	0.4873	0.4601	34
$D R S C M (R e p B E R T)$	0.6313	0.5828	0.5408	0.5228	28	0.4707	0.3974	0.4809	0.4594	11

Tab.1 The comparison results of DRSCM with two different dense retrieval models and four aggregation strategies on GOV2 and Robust04

1	J, Devlin M W, Chang K, Lee K Toutanova . Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics. 2019, 4171–4186
2	O, Khattab M Zaharia . ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020, 39–48
3	N, Reimers I Gurevych . Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 3982–3992
4	J, Zhan J, Mao Y, Liu M, Zhang S Ma . RepBERT: contextualized text embeddings for first-stage retrieval. 2020, arXiv preprint arXiv: 2006.15498
5	L, Zhu Y Yang . ActBERT: learning global-local video-text representations. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8743–8752
6	Z A, Yilmaz S, Wang W, Yang H, Zhang J Lin . Applying BERT to document retrieval with birch. In: Proceedings of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 19–24
7	M, Li E Gaussier . KeyBLD: selecting key blocks with local pre-ranking for long document information retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021, 2207–2211

[1]	FCS-22041-OF-JW_suppl_1	Download
[2]	FCS-22041-OF-JW_suppl_2	Download

Viewed

Full text

Abstract

Cited

Shared

Discussed