Local structured representation for generic object detection

doi:10.1007/s11704-016-5530-6

Front. Comput. Sci.

2017, Vol. 11

Issue (4) : 632-648 https://doi.org/10.1007/s11704-016-5530-6

RESEARCH ARTICLE

Local structured representation for generic object detection

Junge ZHANG^1,³(

), Kaiqi HUANG^1,^2,³(

), Tieniu TAN^1,^2,³(

), Zhaoxiang ZHANG^2,³(

)

¹. Center for Research on Intelligent Perception and Computing, Chinese Academy of Sciences, Beijing 100190, China
². Research Center for Brain-inspired Intelligence, Chinese Academy of Sciences, Beijing 100190, China
³. National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Download: PDF(659 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of structure modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we propose Local Structured Descriptor to capture the object’s local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local structured model with a boosted feature selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.

Keywords Local Structured Descriptor Local Structured Model Object Representation Object Structure Object Detection PASCAL VOC

Corresponding Author(s): Junge ZHANG,Kaiqi HUANG,Tieniu TAN,Zhaoxiang ZHANG

Just Accepted Date: 12 April 2016 Online First Date: 17 March 2017 Issue Date: 26 July 2017

Cite this article:

Junge ZHANG,Kaiqi HUANG,Tieniu TAN, et al. Local structured representation for generic object detection[J]. Front. Comput. Sci., 2017, 11(4): 632-648.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5530-6
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I4/632

1	AlexeB, Deselaers T, FerrariV . Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2189–2202 https://doi.org/10.1109/TPAMI.2012.28
2	ChengM M, ZhangZ, LinW Y, Torr P. Bing: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293 https://doi.org/10.1109/cvpr.2014.414
3	ZitnickC, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405 https://doi.org/10.1007/978-3-319-10602-1_26
4	YaoC, BaiX, LiuW. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737–4749 https://doi.org/10.1109/TIP.2014.2353813
5	ZhuY, YaoC, BaiX. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36 https://doi.org/10.1007/s11704-015-4488-0
6	DalalN, TriggsB. Histograms of oriented gradients for human detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2005, 886–893 https://doi.org/10.1109/cvpr.2005.177
7	VedaldiA, Gulshan V, VarmaM , ZissermanA. Multiple kernels for object detection. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 606–613 https://doi.org/10.1109/iccv.2009.5459183
8	WangX, HanT X, YanS. An HOG-LBP human detector with partial occlusion handling. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 32–39 https://doi.org/10.1109/iccv.2009.5459207
9	FelzenszwalbP, Girshick R, McAllesterD , RamananD. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645 https://doi.org/10.1109/TPAMI.2009.167
10	FergusR, PeronaP, ZissermanA. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2003, 264–271 https://doi.org/10.1109/cvpr.2003.1211479
11	SchnitzspanP, RothS, SchieleB. Automatic discovery of meaningful object parts with latent CRFs. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 121–128 https://doi.org/10.1109/cvpr.2010.5540220
12	YangY, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1385–1392 https://doi.org/10.1109/cvpr.2011.5995741
13	ZhuL, ChenY, YuilleA L, Freeman W T. Latent hierarchical structural learning for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 1062–1069
14	FischlerM, Elschlager R. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, 22(1): 67–92 https://doi.org/10.1109/T-C.1973.223602
15	OjalaT, Pietikäinen M, HarwoodD . A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996, 29(1): 51–59 https://doi.org/10.1016/0031-3203(95)00067-4
16	LoweD G.Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110 https://doi.org/10.1023/B:VISI.0000029664.99615.94
17	MarkE, GoolL, WilliamsC K , WinnJ, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338 https://doi.org/10.1007/s11263-009-0275-4
18	ZhangJ, HuangK, YuY, TanT. Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1393–1400 https://doi.org/10.1109/cvpr.2011.5995678
19	Papageorgiou, C, Poggio T. A trainable system for object detection.International Journal of Computer Vision, 2000, 38(1): 15–33 https://doi.org/10.1023/A:1008162616689
20	ViolaP, JonesM J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154 https://doi.org/10.1023/B:VISI.0000013087.49260.fb
21	LeeT S. Image representation using 2D gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(10): 959–971 https://doi.org/10.1109/34.541406
22	ShechtmanE, IraniM. Matching local self-similarities across images and videos. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2007, 1–8 https://doi.org/10.1109/cvpr.2007.383198
23	FerrariV, Fevrier L, JurieF , SchmidC. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1): 36–51 https://doi.org/10.1109/TPAMI.2007.1144
24	BaiX, BaiS, ZhuZ, Latecki L J. 3D shape matching via two layer coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(12): 2361–2373 https://doi.org/10.1109/TPAMI.2015.2424863
25	LazebnikS, SchmidC, PonceJ. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178 https://doi.org/10.1109/cvpr.2006.68
26	SivicJ, Russell B, EfrosA , ZissermanA, Freeman W. Discovering objects and their location in images. In: Proceedings of IEEE International Conference on Computer Vision. 2005, 370–377 https://doi.org/10.1109/iccv.2005.77
27	FelzenszwalbP F, Huttenlocher D P. Distance transforms of sampled functions. Theory of Computing, 2012, 8(1): 415–428 https://doi.org/10.4086/toc.2012.v008a019
28	EsteparR S J. Local Structure tensor for multidimensional signal processing: applications to medical image analysis. Dissertation for the Doctoral Degree. Valladolid:University of Valladolid, 2005
29	MorroneC, BurrD. Feature detection in human vision: a phasedependent energy model. In: Proceedings of the Royal Society of London B: Biological Sciences. 1988, 221–245
30	VenkateshS, OwensR. On the classification of image features. Pattern Recognition Letters, 1990, 11(5): 339–349 https://doi.org/10.1016/0167-8655(90)90043-2
31	GranlundG H, Knutsson H. Signal Processing for Computer Vision. Dordrecht: Kluwer Academic Publishers, 1995 https://doi.org/10.1007/978-1-4757-2377-9
32	OlivaA, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175 https://doi.org/10.1023/A:1011139631724
33	OjalaT, Pietikainen M, MaenpaaT . Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987 https://doi.org/10.1109/TPAMI.2002.1017623
34	VarmaM, BabuB R. More generality in efficient multiple kernel learning. In: Proceedings of International Conference onMachine Learning. 2009, 1065–1072 https://doi.org/10.1145/1553374.1553510
35	FriedmanJ, HastieT, TibshiraniR . Additive logistic regression: a statistical view of boosting. Annuals of Statistics, 2000, 28(2): 374–376 https://doi.org/10.1214/aos/1016218223
36	HussainS, TriggsB. Feature sets and dimensionality reduction for visual object detection. In: Proceedings of British Machine Vision Conference. 2010 https://doi.org/10.5244/c.24.112
37	FelzenszwalbP F, Girshick R B, McAllesterD . Discriminatively Trained Deformable Part Models, Release 3
38	Felzenszwalb, P F, Girshick R B, McAllesterD . Discriminatively Trained Deformable Part Models, Release 4, 2010
39	GehlerP, Nowozin S. On feature combination for multiclass object classification. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 221–228 https://doi.org/10.1109/iccv.2009.5459169
40	TorralbaA, MurphyK, FreemanW. Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2004, 762–769 https://doi.org/10.1109/cvpr.2004.1315241
41	EveringhamM, GoolV L, WilliamsC K I , WinnJ, Zisserman A. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Lecture Notes in Computer Science, 2003, 2781: 297–304 https://doi.org/10.1007/978-3-540-45243-0_39
42	EveringhamM, GoolV L, WilliamsC K I , WinnJ, Zisserman A. The PASCAL visual object classes challenge 2007 (VOC2007) results. International Journal of Computer Vision, 2010, 88(2): 303–338 https://doi.org/10.1007/s11263-009-0275-4
43	DesaiC, Ramanan D, FowlkesC . Discriminative models for multiclass object layout. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 229–236
44	PedersoliM, Vedaldi A, GonzalezJ . A coarse-to-fine approach for fast deformable object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1353–1360 https://doi.org/10.1109/cvpr.2011.5995668
45	RazaviN, GallJ, GoolV L. Scalable multi-class object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1505–1512 https://doi.org/10.1109/cvpr.2011.5995441
46	DivvalaS K, Zitnick C, KapoorA , BakerS. Detecting objects using unsupervised parts-based attributes. Technical Report CMU-RI-TR-11- 10, Robotics Institute. 2010
47	SchnitzspanP, FritzM, RothS, Schiele B. Discriminative structure learning of hierarchical representations for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2009, 2238–2245 https://doi.org/10.1109/cvpr.2009.5206544
48	MalisiewiczT, GuptaA, EfrosA A. Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 89–96 https://doi.org/10.1109/iccv.2011.6126229
49	DuboutC, Fleuret F. Deformable part models with individual part scaling. In: Proceedings of the British Machine Vision Conference. 2013 https://doi.org/10.5244/c.27.28
50	GidarisS, Komodakis N. Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1134–1142 https://doi.org/10.1109/iccv.2015.135
51	GirshickR. Fast r-cnn. 2015, arXiv:1504.08083
52	GirshickR, Donahue J, DarrellT , MalikJ. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2014, 580–587 https://doi.org/10.1109/cvpr.2014.81
53	HeK, ZhangX, RenS, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of European Conference on Computer Vision. 2014, 346–361 https://doi.org/10.1007/978-3-319-10578-9_23
54	LiangX, LiuS, WeiY, Liu L, LinL , YanS. Computational baby learning. 2014, arXiv:1411.2861
55	RenS, HeK, GirshickR, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. 2015, arXiv:1506.01497
56	RenS, HeK, GirshickR, Zhang X, SunJ . Object detection networks on convolutional feature maps. 2015, arXiv:1504.06066
57	RenW, HuangK, Tao D, Tan T. Weakly supervised large scale object localization with multiple instance learning and bag splitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 32(2): 405–416 https://doi.org/10.1109/TPAMI.2015.2456908
58	WanL, EigenD, FergusR. End-to-end integration of a convolutional network, deformable parts model and non-maximum suppression. 2014, arXiv:1411.5309
59	WangC, HuangK, RenW, Zhang J, MaybankS . Large-scale weakly supervised object localization via latent category learning. IEEE Transactions on Image Processing, 2015, 24(4): 1371–1385 https://doi.org/10.1109/TIP.2015.2396361
60	ZhangY, SohnK, VillegasR, Pan G, LeeH . Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. 2015, arXiv:1504.03293
61	ZhuY, Urtasun R, SalakhutdinovR , FidlerS. segDeepM: exploiting segmentation and context in deep neural networks for object detection. 2015, arXiv:1502.04275
62	SongX, WuT, JiaY, Zhu S C. Discriminatively trained and-or tree models for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 23–28 https://doi.org/10.1109/cvpr.2013.421
63	WangX, LinL, HuangL, Yan S. Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 3334–3341 https://doi.org/10.1109/cvpr.2013.428
64	MarkE, GoolV L, WilliamsC K I , WinnJ, Zisserman A. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. Technical Report. 2008
65	ChenY, ZhuL, YuilleA. Active mask hierarchies for object detection. In: Proceedings of European Conference on Computer Vision. 2010, 43–56 https://doi.org/10.1007/978-3-642-15555-0_4
66	OttP, Everingham M. Shared parts for deformable part-based models. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1513–1520 https://doi.org/10.1109/cvpr.2011.5995357
67	ZhangJ, HuangY, HuangK, Wu Z, TanT . Data decomposition and spatial mixture modeling for part based model. In: Proceedings of Asian Conference on Computer Vision. 2012, 123–137

[1]

FCS-0632-15530-JGZ_suppl_1

Download

[1]	Huaizu JIANG, Ming-Ming CHENG, Shi-Jie LI, Ali BORJI, Jingdong WANG. Joint salient object detection and existence prediction[J]. Front. Comput. Sci., 2019, 13(4): 778-788.
[2]	Jiuyue HAO, Chao LI, Zhang XIONG, Ejaz HUSSAIN. A temporal-spatial background modeling of dynamic scenes[J]. Front Comput Sci Chin, 2011, 5(3): 290-299.

Viewed

Full text

Abstract

Cited

Shared

Discussed