|
|
Scene word recognition from pieces to whole |
Anna ZHU1( ), Seiichi UCHIDA2 |
1. SCST, Wuhan University of Technology,Wuhan 430000, China 2. ISEE-AIT, Kyushu University, Fukuoka 819-0395, Japan |
|
|
Abstract Convolutional neural networks (CNNs) have had great success with regard to the object classification problem. For character classification, we found that training and testing using accurately segmented character regions with CNNs resulted in higher accuracy than when roughly segmented regions were used. Therefore, we expect to extract complete character regions from scene images. Text in natural scene images has an obvious contrast with its attachments. Many methods attempt to extract characters through different segmentation techniques. However, for blurred, occluded, and complex background cases, those methods may result in adjoined or over segmented characters. In this paper, we propose a scene word recognition model that integrates words from small pieces to entire after-cluster-based segmentation. The segmented connected components are classified as four types: background, individual character proposals, adjoined characters, and stroke proposals. Individual character proposals are directly inputted to a CNN that is trained using accurately segmented character images. The sliding window strategy is applied to adjoined character regions. Stroke proposals are considered as fragments of entire characters whose locations are estimated by a stroke spatial distribution system. Then, the estimated characters from adjoined characters and stroke proposals are classified by a CNN that is trained on roughly segmented character images. Finally, a lexicondriven integration method is performed to obtain the final word recognition results. Compared to other word recognition methods, our method achieves a comparable performance on Street View Text and the ICDAR 2003 and ICDAR 2013 benchmark databases. Moreover, our method can deal with recognizing text images of occlusion and improperly segmented text images.
|
Keywords
text recognition
convolutional neural networks
cluster-based segmentation
character integration
|
Corresponding Author(s):
Anna ZHU
|
Just Accepted Date: 29 September 2017
Online First Date: 09 May 2018
Issue Date: 08 April 2019
|
|
1 |
J JWeinman, ZButler, DKnoll, J Feild. Toward integrated scene text reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 375–387
https://doi.org/10.1109/TPAMI.2013.126
|
2 |
QYe, D Doermann. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480–1500
https://doi.org/10.1109/TPAMI.2014.2366765
|
3 |
Y YZhu, CYao, XBai. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36
https://doi.org/10.1007/s11704-015-4488-0
|
4 |
VGoel, AMishra, KAlahari, C V Jawahar. Whole is greater than sum of parts: recognizing scene text words. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2013, 398–402
https://doi.org/10.1109/ICDAR.2013.87
|
5 |
MJaderberg, K Simonyan, AVedaldi, AZisserman. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 2016, 116(1): 1–20
https://doi.org/10.1007/s11263-015-0823-z
|
6 |
TWang, D JWu, ACoates, A Y Ng. End-to-end text recognition with convolutional neural networks. In: Proceedings of IEEE International Conference on Pattern Recognition. 2012, 3304–3308
|
7 |
AMishra, K Alahari, C VJawahar. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2687–2694
https://doi.org/10.1109/CVPR.2012.6247990
|
8 |
PHe, WHuang, YQiao, C C Loy, XTang. Reading scene text in deep convolutional sequences. In: Proceedings of AAAI Conference on Artificial Intelligence. 2016
|
9 |
B GShi, XBai, CYao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. ITEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298–2304
|
10 |
OAlsharif, JPineau. End-to-end text recognition with hybrid HMM maxout models. 2013, arXiv preprint arXiv:1310.1811
|
11 |
CYao, XBai, BShi, W Y Liu. Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4042–4049
https://doi.org/10.1109/CVPR.2014.515
|
12 |
C LZitnick, P Dollár. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405
https://doi.org/10.1007/978-3-319-10602-1_26
|
13 |
CMancas-Thillou, B Gosselin. Color text extraction with selective metric-based clustering. Computer Vision and Image Understanding, 2007, 107(1): 97–107
https://doi.org/10.1016/j.cviu.2006.11.010
|
14 |
SSarawagi, W WCohen. Semi-Markov conditional random fields for information extraction. In: Proceedings of International Conference on Neural Information Processing Systems. 2004, 1185–1192
|
15 |
BWang, X FLi, FLiu, F Q Hu. Color text image binarization based on binary texture analysis. Pattern Recognition Letters, 2005, 26(11): 1650–1657
https://doi.org/10.1016/j.patrec.2004.12.006
|
16 |
J HSeok, J HKim. Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields. Pattern Recognition, 2015, 48(11): 3584–3599
https://doi.org/10.1016/j.patcog.2015.05.004
|
17 |
NDalal, BTriggs. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886–893
https://doi.org/10.1109/CVPR.2005.177
|
18 |
SMcCann, D GLowe. Spatially local coding for object recognition. In: Proceedings of Asian Conference on Computer Vision. 2012, 204–217
|
19 |
ANeubeck, L Van Gool. Efficient non-maximum suppression. In: Proceedings of IEEE International Conference on Pattern Recognition. 2006, 850–855
https://doi.org/10.1109/ICPR.2006.479
|
20 |
T Ede Campos, B RBabu, MVarma. Character recognition in natural images. In: Proceedings of International Conference on Computer Vision Theory and Applications. 2009, 273–280
|
21 |
S MLucas, A Panaretos, LSosa, ATang, SWong, RYoung. ICDAR 2003 robust reading competitions. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2003
https://doi.org/10.1109/ICDAR.2003.1227749
|
22 |
KWang, B Babenko, SBelongie. End-to-end scene text recognition. In: Proceedings of International Conference on Computer Vision. 2011, 1457–1464
|
23 |
KWang, S Belongie. Word spotting in the wild. In: Proceedings of European Conference on Computer Vision. 2010, 591–604
https://doi.org/10.1007/978-3-642-15549-9_43
|
24 |
XBai, CYao, W YLiu. Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transactions on Image Processing, 2016, 25(6): 2789–2802
https://doi.org/10.1109/TIP.2016.2555080
|
25 |
C ZShi, C HWang, B HXiao, S Gao, J LHu. End-to-end scene text recognition using tree-structured models. Pattern Recognition, 2014, 47(9): 2853–2866
https://doi.org/10.1016/j.patcog.2014.03.023
|
26 |
AMishra, K Alahari, C VJawahar. Scene text recognition using higher order language priors. In: Proceedings of British Machine Vision Conference. 2012
https://doi.org/10.5244/C.26.127
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|