Excellent Young Computer Scientists Forum |
|
|
|
Label distribution learning for scene text detection |
Haoyu MA1, Ningning LU1, Junjun MEI2, Tao GUAN2, Yu ZHANG1( ), Xin GENG1 |
1. School of Computer Science and Engineering, and the Key Lab of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing 211189, China 2. ZTE Corporation, Nanjing 210012, China |
|
|
Abstract Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.
|
Keywords
scene text detection
multi-task learning
label distribution learning
|
Corresponding Author(s):
Yu ZHANG
|
Just Accepted Date: 26 October 2022
Issue Date: 17 January 2023
|
|
1 |
A, Zhu S Uchida . Scene word recognition from pieces to whole. Frontiers of Computer Science, 2019, 13( 2): 292–301
|
2 |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
|
3 |
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440
|
4 |
W, Liu D, Anguelov D, Erhan C, Szegedy S, Reed C Y, Fu A C Berg . SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21−37
|
5 |
S, Ren K, He R, Girshick J Sun . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149
|
6 |
H, Jiang M M, Cheng S J, Li A, Borji J Wang . Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13( 4): 778–788
|
7 |
M, Li J, Mao X, Qi C Jin . A framework for cloned vehicle detection. Frontiers of Computer Science, 2020, 14( 5): 145609
|
8 |
Z, Tian W, Huang T, He P, He Y Qiao . Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 56−72
|
9 |
M, Liao B, Shi X, Bai X, Wang W Liu . TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4161−4167
|
10 |
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: an efficient and accurate scene text detector. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2642−2651
|
11 |
D, Deng H, Liu X, Li D Cai . PixelLink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 6773−6780
|
12 |
S, Long J, Ruan W, Zhang X, He W, Wu C Yao . TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 19−35
|
13 |
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9328−9337
|
14 |
M, Liao Z, Wan C, Yao K, Chen X Bai . Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11474−11481
|
15 |
Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3482−3490
|
16 |
Y, Jiang X, Zhu X, Wang S, Yang W, Li H, Wang P, Fu Z Luo . R2CNN: rotational region CNN for orientation robust scene text detection. 2017, arXiv preprint arXiv: 1706.09579
|
17 |
B B, Gao C, Xing C W, Xie J, Wu X Geng . Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017, 26( 6): 2825–2838
|
18 |
X, Geng C, Yin Z H Zhou . Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 10): 2401–2412
|
19 |
M, Liao B, Shi X Bai . Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27( 8): 3676–3690
|
20 |
Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3454−3461
|
21 |
C, Yao X, Bai N, Sang X, Zhou S, Zhou Z Cao . Scene text detection via holistic, multi-channel prediction. 2016, arXiv preprint arXiv: 1606.09002
|
22 |
T, Cour B, Sapp C, Jordan B Taskar . Learning from ambiguously labeled images. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 919−926
|
23 |
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 764−773
|
24 |
X, Zhu H, Hu S, Lin J Dai . Deformable convNets V2: more deformable, better results. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9300−9308
|
25 |
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2315−2324
|
26 |
C K, Ch’ng C S Chan . Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. 2017, 935−942
|
27 |
D, Karatzas L, Gomez-Bigorda A, Nicolaou S, Ghosh A, Bagdanov M, Iwamura J, Matas L, Neumann V R, Chandrasekhar S, Lu F, Shafait S, Uchida E Valveny . ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015, 1156−1160
|
28 |
C, Yao X, Bai W, Liu Y, Ma Z Tu . Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083−1090
|
29 |
C, Yao X, Bai W Liu . A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23( 11): 4737–4749
|
30 |
Y, Liu L, Jin S, Zhang C, Luo S Zhang . Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337–345
|
31 |
J, Deng W, Dong R, Socher L J, Li K, Li L Fei-Fei . ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255
|
32 |
Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S. Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6442−6451
|
33 |
P, Lyu M, Liao C, Yao W, Wu X Bai . Mask textSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 71−88
|
34 |
Y, Xu Y, Wang W, Zhou Y, Wang Z, Yang X Bai . TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28( 11): 5566–5579
|
35 |
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X. Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10544−10553
|
36 |
Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9357−9366
|
37 |
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7261−7270
|
38 |
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J. Learning shape-aware embedding for scene text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4229−4238
|
39 |
He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 3066−3074
|
40 |
Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E. WordSup: exploiting word annotations for character based text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 4950−4959
|
41 |
Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7553−7563
|
42 |
Liao M, Zhu Z, Shi B, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5909−5918
|
43 |
Z, Liu G, Lin S, Yang J, Feng W, Lin W L Goh . Learning Markov clustering networks for scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6936−6944
|
44 |
T, He W, Huang Y, Qiao J Yao . Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25( 6): 2529–2541
|
45 |
He W, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 745−753
|
46 |
J, Ma W, Shao H, Ye L, Wang H, Wang Y, Zheng X Xue . Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20( 11): 3111–3122
|
47 |
C, Xue S, Lu F Zhan . Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 370−387
|
48 |
C, Xue S, Lu W Zhang . MSR: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 989−995
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|