Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (6) : 176339    https://doi.org/10.1007/s11704-022-1446-5
Excellent Young Computer Scientists Forum
Label distribution learning for scene text detection
Haoyu MA1, Ningning LU1, Junjun MEI2, Tao GUAN2, Yu ZHANG1(), Xin GENG1
1. School of Computer Science and Engineering, and the Key Lab of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing 211189, China
2. ZTE Corporation, Nanjing 210012, China
 Download: PDF(2041 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.

Keywords scene text detection      multi-task learning      label distribution learning     
Corresponding Author(s): Yu ZHANG   
Just Accepted Date: 26 October 2022   Issue Date: 17 January 2023
 Cite this article:   
Haoyu MA,Ningning LU,Junjun MEI, et al. Label distribution learning for scene text detection[J]. Front. Comput. Sci., 2023, 17(6): 176339.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1446-5
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I6/176339
Fig.1  Illustration of our pipeline, where red arrow stands for the inference stage and the 1/2, 1/4, … and 1/32 indicate the scale ratio compared to the input image. Soft ground truth is the core component for our model, with each pixel value representing distribution of 0-1 instead of discrete label representing inside text region or not
Fig.2  Label generation. The first column are the training images, the second column are the shrunk text regions for the probability map, the third and last column are the Gaussian distribution and the distance distribution for the soft maps
Method Precision Recall F-mesaure
ResNet-18 84.7 77.0 80.6
ResNet-18 + MTL 87.0 78.2 82.4
ResNet-18 + Gau 86.6 79.0 82.7
ResNet-18 + Dis 86.5 80.6 83.5
ResNet-50 90.5 77.9 83.7
ResNet-50 + MTL 92.0 78.0 84.4
ResNet-50 + Gau 93.3 84.2 88.5
ResNet-50 + Dis 95.4 86.2 90.6
Tab.1  Detection results with different settings on the MSRA-TD500 dataset
Model Region Precision Recall F-mesaure
Resnet50 + Dis Whole image 93.9 85.4 89.4
Resnet50 + Dis Dilated text region 95.4 86.2 90.6
Tab.2  Detection results with different focused region on the MSRA-TD500
Fig.3  Qualitative results of the proposed method, including curved text, multi-oriented text, vertical text, and long text lines
Method Precision Recall F-measure
TextSnake [12] 82.7 74.5 78.4
ATRR [32] 80.9 76.2 78.5
Mask TextSpotter [33] 82.5 75.6 78.6
TextField [34] 81.2 79.9 80.6
LOMO* [35] 87.6 79.3 83.3
CRAFT [36] 87.6 79.9 83.6
CSE [37] 81.4 79.1 80.2
PSENet-1s [13] 84.0 78.0 80.9
DB-ResNet-50(800) [14] 87.1 82.5 84.7
Ours-ResNet50(800, so) 89.1 82.5 85.7
Ours-ResNet50(800) 88.6 83.9 86.2
Tab.3  Detection results on the Total-Text dataset
Method Precision Recall F-measure
CTPN [8] 60.4 53.8 56.9
TextSnake [12] 77.8 82.7 80.1
ATRR [32] 80.2 80.1 80.1
TextField [34] 79.8 83.0 81.4
CRAFT [36] 81.1 86.0 83.5
PSENet-1s [13] 79.7 84.8 82.2
SAE(990) [38] 82.7 77.8 80.1
DB-ResNet-50(800) [14] 86.9 80.2 83.7
Ours-ResNet50(800, so) 86.4 83.3 84.8
Ours-ResNet50(800) 87.9 84.6 86.2
Tab.4  Detection results on the SCUT-CTW1500 dataset
Method Precision Recall F-measure
CTPN [8] 74.2 51.6 60.9
EAST [10] 83.6 73.5 78.2
SSTD [39] 80.2 73.9 76.9
WordSup [40] 79.3 77 78.2
Corner [41] 94.1 70.1 80.7
TB [19] 87.2 76.7 81.7
RRD [42] 85.6 79 82.2
MCN [43] 72 80 76
TextSnake [12] 84.9 80.4 82.6
PSENet-1s [13] 86.9 84.5 85.7
CRAFT [36] 89.8 84.3 86.9
SAE(720) [38] 85.1 84.1 84.8
SAE(990) [38] 88.3 85.0 86.6
DB-ResNet50(1152) [14] 91.8 83.2 87.3
Ours-ResNet50(1152, so) 90.4 83.4 86.7
Ours-ResNet50(1152) 91.5 84.7 87.9
Tab.5  Detection results on the ICDAR 2015 dataset
Method Precision Recall F-measure
He et al. [44] 71 61 69
DeepReg [45] 77 70 74
RRPN [46] 82 68 74
RRD [42] 87 73 79
MCN [43] 88 79 83
PixelLink [11] 83 73.2 77.8
Corner [41] 87.6 76.2 81.5
TextSnake [12] 83.2 73.9 78.3
Xue et al. [47] 83.0 77.4 80.1
MSR[48] 87.4 76.7 81.7
CRAFT [36] 88.2 78.2 82.9
SAE [38] 84.2 81.7 82.9
DB-ResNet50(736) [14] 91.5 79.2 84.9
Ours-ResNet50(736, so) 91.4 82.4 86.7
Ours-ResNet50(736) 95.4 86.2 90.6
Tab.6  Detection results on the MSRA-TD500 dataset
Method FPS
DeepReg [45] 1.1
RRD [42] 10
PixelLink [11] 3
Corner [41] 5.7
TextSnake [12] 1.1
CRAFT [36] 8.6
DB-ResNet18(736) [14] 62
DB-ResNet50(736) [14] 32
Ours-ResNet18(736) 60
Ours-ResNet50(736) 32
Tab.7  Comparison over inference speed on the MSRA-TD500
  
  
  
  
  
  
1 A, Zhu S Uchida . Scene word recognition from pieces to whole. Frontiers of Computer Science, 2019, 13( 2): 292–301
2 He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
3 Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440
4 W, Liu D, Anguelov D, Erhan C, Szegedy S, Reed C Y, Fu A C Berg . SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21−37
5 S, Ren K, He R, Girshick J Sun . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149
6 H, Jiang M M, Cheng S J, Li A, Borji J Wang . Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13( 4): 778–788
7 M, Li J, Mao X, Qi C Jin . A framework for cloned vehicle detection. Frontiers of Computer Science, 2020, 14( 5): 145609
8 Z, Tian W, Huang T, He P, He Y Qiao . Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 56−72
9 M, Liao B, Shi X, Bai X, Wang W Liu . TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4161−4167
10 Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: an efficient and accurate scene text detector. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2642−2651
11 D, Deng H, Liu X, Li D Cai . PixelLink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 6773−6780
12 S, Long J, Ruan W, Zhang X, He W, Wu C Yao . TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 19−35
13 Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9328−9337
14 M, Liao Z, Wan C, Yao K, Chen X Bai . Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11474−11481
15 Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3482−3490
16 Y, Jiang X, Zhu X, Wang S, Yang W, Li H, Wang P, Fu Z Luo . R2CNN: rotational region CNN for orientation robust scene text detection. 2017, arXiv preprint arXiv: 1706.09579
17 B B, Gao C, Xing C W, Xie J, Wu X Geng . Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017, 26( 6): 2825–2838
18 X, Geng C, Yin Z H Zhou . Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 10): 2401–2412
19 M, Liao B, Shi X Bai . Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27( 8): 3676–3690
20 Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3454−3461
21 C, Yao X, Bai N, Sang X, Zhou S, Zhou Z Cao . Scene text detection via holistic, multi-channel prediction. 2016, arXiv preprint arXiv: 1606.09002
22 T, Cour B, Sapp C, Jordan B Taskar . Learning from ambiguously labeled images. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 919−926
23 Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 764−773
24 X, Zhu H, Hu S, Lin J Dai . Deformable convNets V2: more deformable, better results. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9300−9308
25 Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2315−2324
26 C K, Ch’ng C S Chan . Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. 2017, 935−942
27 D, Karatzas L, Gomez-Bigorda A, Nicolaou S, Ghosh A, Bagdanov M, Iwamura J, Matas L, Neumann V R, Chandrasekhar S, Lu F, Shafait S, Uchida E Valveny . ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015, 1156−1160
28 C, Yao X, Bai W, Liu Y, Ma Z Tu . Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083−1090
29 C, Yao X, Bai W Liu . A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23( 11): 4737–4749
30 Y, Liu L, Jin S, Zhang C, Luo S Zhang . Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337–345
31 J, Deng W, Dong R, Socher L J, Li K, Li L Fei-Fei . ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255
32 Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S. Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6442−6451
33 P, Lyu M, Liao C, Yao W, Wu X Bai . Mask textSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 71−88
34 Y, Xu Y, Wang W, Zhou Y, Wang Z, Yang X Bai . TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28( 11): 5566–5579
35 Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X. Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10544−10553
36 Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9357−9366
37 Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7261−7270
38 Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J. Learning shape-aware embedding for scene text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4229−4238
39 He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 3066−3074
40 Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E. WordSup: exploiting word annotations for character based text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 4950−4959
41 Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7553−7563
42 Liao M, Zhu Z, Shi B, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5909−5918
43 Z, Liu G, Lin S, Yang J, Feng W, Lin W L Goh . Learning Markov clustering networks for scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6936−6944
44 T, He W, Huang Y, Qiao J Yao . Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25( 6): 2529–2541
45 He W, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 745−753
46 J, Ma W, Shao H, Ye L, Wang H, Wang Y, Zheng X Xue . Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20( 11): 3111–3122
47 C, Xue S, Lu F Zhan . Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 370−387
48 C, Xue S, Lu W Zhang . MSR: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 989−995
[1] FCS-21446-OF-HM_suppl_1 Download
[1] Quan FENG, Songcan CHEN. Learning multi-tasks with inconsistent labels by using auxiliary big task[J]. Front. Comput. Sci., 2023, 17(5): 175342-.
[2] Xinyue DONG, Tingjin LUO, Ruidong FAN, Wenzhang ZHUGE, Chenping HOU. Active label distribution learning via kernel maximum mean discrepancy[J]. Front. Comput. Sci., 2023, 17(4): 174327-.
[3] Jiansheng WU, Chuangchuang LAN, Xuelin YE, Jiale DENG, Wanqing HUANG, Xueni YANG, Yanxiang ZHU, Haifeng HU. Disclosing incoherent sparse and low-rank patterns inside homologous GPCR tasks for better modelling of ligand bioactivities[J]. Front. Comput. Sci., 2022, 16(4): 164322-.
[4] Yi REN, Ning XU, Miaogen LING, Xin GENG. Label distribution for multimodal machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161306-.
[5] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[6] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[7] Miaogen LING, Xin GENG. Soft video parsing by label distribution learning[J]. Front. Comput. Sci., 2019, 13(2): 302-317.
[8] Hao ZHENG,Xin GENG. Facial expression recognition via weighted group sparsity[J]. Front. Comput. Sci., 2017, 11(2): 266-275.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed