Label distribution learning for scene text detection

doi:10.1007/s11704-022-1446-5

Front. Comput. Sci.

2023, Vol. 17

Issue (6) : 176339 https://doi.org/10.1007/s11704-022-1446-5

Excellent Young Computer Scientists Forum

Label distribution learning for scene text detection

Haoyu MA¹, Ningning LU¹, Junjun MEI², Tao GUAN², Yu ZHANG¹(

), Xin GENG¹

¹. School of Computer Science and Engineering, and the Key Lab of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing 211189, China
². ZTE Corporation, Nanjing 210012, China

Download: PDF(2041 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.

Keywords scene text detection multi-task learning label distribution learning

Corresponding Author(s): Yu ZHANG

Just Accepted Date: 26 October 2022 Issue Date: 17 January 2023

Cite this article:

Haoyu MA,Ningning LU,Junjun MEI, et al. Label distribution learning for scene text detection[J]. Front. Comput. Sci., 2023, 17(6): 176339.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1446-5
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I6/176339

Fig.1 Illustration of our pipeline, where red arrow stands for the inference stage and the 1/2, 1/4, … and 1/32 indicate the scale ratio compared to the input image. Soft ground truth is the core component for our model, with each pixel value representing distribution of 0-1 instead of discrete label representing inside text region or not

Fig.2 Label generation. The first column are the training images, the second column are the shrunk text regions for the probability map, the third and last column are the Gaussian distribution and the distance distribution for the soft maps

Tab.1 Detection results with different settings on the MSRA-TD500 dataset

Tab.2 Detection results with different focused region on the MSRA-TD500

Fig.3 Qualitative results of the proposed method, including curved text, multi-oriented text, vertical text, and long text lines

Tab.3 Detection results on the Total-Text dataset

Tab.4 Detection results on the SCUT-CTW1500 dataset

Tab.5 Detection results on the ICDAR 2015 dataset

Tab.6 Detection results on the MSRA-TD500 dataset

Tab.7 Comparison over inference speed on the MSRA-TD500

1	A, Zhu S Uchida . Scene word recognition from pieces to whole. Frontiers of Computer Science, 2019, 13( 2): 292–301
2	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
3	Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440
4	W, Liu D, Anguelov D, Erhan C, Szegedy S, Reed C Y, Fu A C Berg . SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21−37
5	S, Ren K, He R, Girshick J Sun . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149
6	H, Jiang M M, Cheng S J, Li A, Borji J Wang . Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13( 4): 778–788
7	M, Li J, Mao X, Qi C Jin . A framework for cloned vehicle detection. Frontiers of Computer Science, 2020, 14( 5): 145609
8	Z, Tian W, Huang T, He P, He Y Qiao . Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 56−72
9	M, Liao B, Shi X, Bai X, Wang W Liu . TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4161−4167
10	Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: an efficient and accurate scene text detector. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2642−2651
11	D, Deng H, Liu X, Li D Cai . PixelLink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 6773−6780
12	S, Long J, Ruan W, Zhang X, He W, Wu C Yao . TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 19−35
13	Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9328−9337
14	M, Liao Z, Wan C, Yao K, Chen X Bai . Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11474−11481
15	Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3482−3490
16	Y, Jiang X, Zhu X, Wang S, Yang W, Li H, Wang P, Fu Z Luo . R2CNN: rotational region CNN for orientation robust scene text detection. 2017, arXiv preprint arXiv: 1706.09579
17	B B, Gao C, Xing C W, Xie J, Wu X Geng . Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017, 26( 6): 2825–2838
18	X, Geng C, Yin Z H Zhou . Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 10): 2401–2412
19	M, Liao B, Shi X Bai . Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27( 8): 3676–3690
20	Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3454−3461
21	C, Yao X, Bai N, Sang X, Zhou S, Zhou Z Cao . Scene text detection via holistic, multi-channel prediction. 2016, arXiv preprint arXiv: 1606.09002
22	T, Cour B, Sapp C, Jordan B Taskar . Learning from ambiguously labeled images. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 919−926
23	Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 764−773
24	X, Zhu H, Hu S, Lin J Dai . Deformable convNets V2: more deformable, better results. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9300−9308
25	Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2315−2324
26	C K, Ch’ng C S Chan . Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. 2017, 935−942
27	D, Karatzas L, Gomez-Bigorda A, Nicolaou S, Ghosh A, Bagdanov M, Iwamura J, Matas L, Neumann V R, Chandrasekhar S, Lu F, Shafait S, Uchida E Valveny . ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015, 1156−1160
28	C, Yao X, Bai W, Liu Y, Ma Z Tu . Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083−1090
29	C, Yao X, Bai W Liu . A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23( 11): 4737–4749
30	Y, Liu L, Jin S, Zhang C, Luo S Zhang . Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337–345
31	J, Deng W, Dong R, Socher L J, Li K, Li L Fei-Fei . ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255
32	Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S. Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6442−6451
33	P, Lyu M, Liao C, Yao W, Wu X Bai . Mask textSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 71−88
34	Y, Xu Y, Wang W, Zhou Y, Wang Z, Yang X Bai . TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28( 11): 5566–5579
35	Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X. Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10544−10553
36	Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9357−9366
37	Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7261−7270
38	Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J. Learning shape-aware embedding for scene text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4229−4238
39	He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 3066−3074
40	Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E. WordSup: exploiting word annotations for character based text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 4950−4959
41	Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7553−7563
42	Liao M, Zhu Z, Shi B, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5909−5918
43	Z, Liu G, Lin S, Yang J, Feng W, Lin W L Goh . Learning Markov clustering networks for scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6936−6944
44	T, He W, Huang Y, Qiao J Yao . Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25( 6): 2529–2541
45	He W, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 745−753
46	J, Ma W, Shao H, Ye L, Wang H, Wang Y, Zheng X Xue . Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20( 11): 3111–3122
47	C, Xue S, Lu F Zhan . Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 370−387
48	C, Xue S, Lu W Zhang . MSR: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 989−995

[1]

FCS-21446-OF-HM_suppl_1

Download

[1]	Quan FENG, Songcan CHEN. Learning multi-tasks with inconsistent labels by using auxiliary big task[J]. Front. Comput. Sci., 2023, 17(5): 175342-.
[2]	Xinyue DONG, Tingjin LUO, Ruidong FAN, Wenzhang ZHUGE, Chenping HOU. Active label distribution learning via kernel maximum mean discrepancy[J]. Front. Comput. Sci., 2023, 17(4): 174327-.
[3]	Jiansheng WU, Chuangchuang LAN, Xuelin YE, Jiale DENG, Wanqing HUANG, Xueni YANG, Yanxiang ZHU, Haifeng HU. Disclosing incoherent sparse and low-rank patterns inside homologous GPCR tasks for better modelling of ligand bioactivities[J]. Front. Comput. Sci., 2022, 16(4): 164322-.
[4]	Yi REN, Ning XU, Miaogen LING, Xin GENG. Label distribution for multimodal machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161306-.
[5]	Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[6]	Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[7]	Miaogen LING, Xin GENG. Soft video parsing by label distribution learning[J]. Front. Comput. Sci., 2019, 13(2): 302-317.
[8]	Hao ZHENG,Xin GENG. Facial expression recognition via weighted group sparsity[J]. Front. Comput. Sci., 2017, 11(2): 266-275.

Viewed

Full text

Abstract

Cited

Shared

Discussed