Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (2) : 292-301    https://doi.org/10.1007/s11704-017-6420-2
RESEARCH ARTICLE
Scene word recognition from pieces to whole
Anna ZHU1(), Seiichi UCHIDA2
1. SCST, Wuhan University of Technology,Wuhan 430000, China
2. ISEE-AIT, Kyushu University, Fukuoka 819-0395, Japan
 Download: PDF(681 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Convolutional neural networks (CNNs) have had great success with regard to the object classification problem. For character classification, we found that training and testing using accurately segmented character regions with CNNs resulted in higher accuracy than when roughly segmented regions were used. Therefore, we expect to extract complete character regions from scene images. Text in natural scene images has an obvious contrast with its attachments. Many methods attempt to extract characters through different segmentation techniques. However, for blurred, occluded, and complex background cases, those methods may result in adjoined or over segmented characters. In this paper, we propose a scene word recognition model that integrates words from small pieces to entire after-cluster-based segmentation. The segmented connected components are classified as four types: background, individual character proposals, adjoined characters, and stroke proposals. Individual character proposals are directly inputted to a CNN that is trained using accurately segmented character images. The sliding window strategy is applied to adjoined character regions. Stroke proposals are considered as fragments of entire characters whose locations are estimated by a stroke spatial distribution system. Then, the estimated characters from adjoined characters and stroke proposals are classified by a CNN that is trained on roughly segmented character images. Finally, a lexicondriven integration method is performed to obtain the final word recognition results. Compared to other word recognition methods, our method achieves a comparable performance on Street View Text and the ICDAR 2003 and ICDAR 2013 benchmark databases. Moreover, our method can deal with recognizing text images of occlusion and improperly segmented text images.

Keywords text recognition      convolutional neural networks      cluster-based segmentation      character integration     
Corresponding Author(s): Anna ZHU   
Just Accepted Date: 29 September 2017   Online First Date: 09 May 2018    Issue Date: 08 April 2019
 Cite this article:   
Anna ZHU,Seiichi UCHIDA. Scene word recognition from pieces to whole[J]. Front. Comput. Sci., 2019, 13(2): 292-301.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-017-6420-2
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I2/292
1 J JWeinman, ZButler, DKnoll, J Feild. Toward integrated scene text reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 375–387
https://doi.org/10.1109/TPAMI.2013.126
2 QYe, D Doermann. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480–1500
https://doi.org/10.1109/TPAMI.2014.2366765
3 Y YZhu, CYao, XBai. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36
https://doi.org/10.1007/s11704-015-4488-0
4 VGoel, AMishra, KAlahari, C V Jawahar. Whole is greater than sum of parts: recognizing scene text words. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2013, 398–402
https://doi.org/10.1109/ICDAR.2013.87
5 MJaderberg, K Simonyan, AVedaldi, AZisserman. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 2016, 116(1): 1–20
https://doi.org/10.1007/s11263-015-0823-z
6 TWang, D JWu, ACoates, A Y Ng. End-to-end text recognition with convolutional neural networks. In: Proceedings of IEEE International Conference on Pattern Recognition. 2012, 3304–3308
7 AMishra, K Alahari, C VJawahar. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2687–2694
https://doi.org/10.1109/CVPR.2012.6247990
8 PHe, WHuang, YQiao, C C Loy, XTang. Reading scene text in deep convolutional sequences. In: Proceedings of AAAI Conference on Artificial Intelligence. 2016
9 B GShi, XBai, CYao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. ITEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298–2304
10 OAlsharif, JPineau. End-to-end text recognition with hybrid HMM maxout models. 2013, arXiv preprint arXiv:1310.1811
11 CYao, XBai, BShi, W Y Liu. Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4042–4049
https://doi.org/10.1109/CVPR.2014.515
12 C LZitnick, P Dollár. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405
https://doi.org/10.1007/978-3-319-10602-1_26
13 CMancas-Thillou, B Gosselin. Color text extraction with selective metric-based clustering. Computer Vision and Image Understanding, 2007, 107(1): 97–107
https://doi.org/10.1016/j.cviu.2006.11.010
14 SSarawagi, W WCohen. Semi-Markov conditional random fields for information extraction. In: Proceedings of International Conference on Neural Information Processing Systems. 2004, 1185–1192
15 BWang, X FLi, FLiu, F Q Hu. Color text image binarization based on binary texture analysis. Pattern Recognition Letters, 2005, 26(11): 1650–1657
https://doi.org/10.1016/j.patrec.2004.12.006
16 J HSeok, J HKim. Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields. Pattern Recognition, 2015, 48(11): 3584–3599
https://doi.org/10.1016/j.patcog.2015.05.004
17 NDalal, BTriggs. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886–893
https://doi.org/10.1109/CVPR.2005.177
18 SMcCann, D GLowe. Spatially local coding for object recognition. In: Proceedings of Asian Conference on Computer Vision. 2012, 204–217
19 ANeubeck, L Van Gool. Efficient non-maximum suppression. In: Proceedings of IEEE International Conference on Pattern Recognition. 2006, 850–855
https://doi.org/10.1109/ICPR.2006.479
20 T Ede Campos, B RBabu, MVarma. Character recognition in natural images. In: Proceedings of International Conference on Computer Vision Theory and Applications. 2009, 273–280
21 S MLucas, A Panaretos, LSosa, ATang, SWong, RYoung. ICDAR 2003 robust reading competitions. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2003
https://doi.org/10.1109/ICDAR.2003.1227749
22 KWang, B Babenko, SBelongie. End-to-end scene text recognition. In: Proceedings of International Conference on Computer Vision. 2011, 1457–1464
23 KWang, S Belongie. Word spotting in the wild. In: Proceedings of European Conference on Computer Vision. 2010, 591–604
https://doi.org/10.1007/978-3-642-15549-9_43
24 XBai, CYao, W YLiu. Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transactions on Image Processing, 2016, 25(6): 2789–2802
https://doi.org/10.1109/TIP.2016.2555080
25 C ZShi, C HWang, B HXiao, S Gao, J LHu. End-to-end scene text recognition using tree-structured models. Pattern Recognition, 2014, 47(9): 2853–2866
https://doi.org/10.1016/j.patcog.2014.03.023
26 AMishra, K Alahari, C VJawahar. Scene text recognition using higher order language priors. In: Proceedings of British Machine Vision Conference. 2012
https://doi.org/10.5244/C.26.127
[1] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[2] Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU. Robust feature learning for online discriminative tracking without large-scale pre-training[J]. Front. Comput. Sci., 2018, 12(6): 1160-1172.
[3] Qianjun ZHANG, Lei ZHANG. Convolutional adaptive denoising autoencoders for hierarchical feature extraction[J]. Front. Comput. Sci., 2018, 12(6): 1140-1148.
[4] Lili HUANG, Jiefeng PENG, Ruimao ZHANG, Guanbin LI, Liang LIN. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Front. Comput. Sci., 2018, 12(5): 840-857.
[5] Feifei ZHANG,Yongbin YU,Qirong MAO,Jianping GOU,Yongzhao ZHAN. Pose-robust feature learning for facial expression recognition[J]. Front. Comput. Sci., 2016, 10(5): 832-844.
[6] Yi ZHENG,Qi LIU,Enhong CHEN,Yong GE,J. Leon ZHAO. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification[J]. Front. Comput. Sci., 2016, 10(1): 96-112.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed