Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (1) : 13-26    https://doi.org/10.1007/s11704-016-5514-6
REVIEW ARTICLE
Image categorization with resource constraints: introduction, challenges and advances
Jian-Hao LUO,Wang ZHOU,Jianxin WU()
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
 Download: PDF(791 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.

Keywords image categorization      resource constraints      large scale classification      deep neural networks     
Corresponding Author(s): Jianxin WU   
Just Accepted Date: 16 June 2016   Online First Date: 25 July 2016    Issue Date: 11 January 2017
 Cite this article:   
Jian-Hao LUO,Wang ZHOU,Jianxin WU. Image categorization with resource constraints: introduction, challenges and advances[J]. Front. Comput. Sci., 2017, 11(1): 13-26.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5514-6
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I1/13
1 Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
https://doi.org/10.1023/B:VISI.0000013087.49260.fb
2 Wu J, Liu N, Geyer C, Rehg M J. C4: a real-time object detection framework. IEEE Transactions on Image Processing, 2013, 22(10): 4096–4107
https://doi.org/10.1109/TIP.2013.2270111
3 Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178
https://doi.org/10.1109/cvpr.2006.68
4 Datta R, Joshi D, Li J, Wang Z J. Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 5
https://doi.org/10.1145/1348246.1348248
5 Breitenstein M D, Reichlin F, Leibe B, Koller-Meier E, Van Gool L. Robust tracking-by-detection using a detector confidence particle filter. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1515–1522
https://doi.org/10.1109/iccv.2009.5459278
6 Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8
https://doi.org/10.1109/cvpr.2007.383266
7 Sánchez J, Perronnin F, Mensink T, Verbeek J. Image classification with the fisher vector: theory and practice. International Journal of Computer Vision, 2013, 105(3): 222–245
https://doi.org/10.1007/s11263-013-0636-x
8 Arandjelovic R, Zisserman A. All about VLAD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1578–1585
https://doi.org/10.1109/cvpr.2013.207
9 Wu J, Rehg J M. CENTRIST: a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1489–1501
https://doi.org/10.1109/TPAMI.2010.224
10 Wu J, Yang H. Linear regression-based efficient SVM learning for large-scale classification. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2357–2369
https://doi.org/10.1109/TNNLS.2014.2382123
11 Perronnin F, Sánchez J, Liu Y. Large-scale image categorization with explicit data embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2297–2304
https://doi.org/10.1109/cvpr.2010.5539914
12 Vedaldi A, Zisserman A. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 480–492
https://doi.org/10.1109/TPAMI.2011.153
13 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference. 2012, 1097–1105
14 Deng J, Berg A C, Li K, Li F-F. What does classifying more than 10,000 image categories tell us? In: Proceedings of the 11th European Conference on Computer Vision. 2010, 71–84
https://doi.org/10.1007/978-3-642-15555-0_6
15 Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T. Largescale image classification: fast feature extraction and SVM training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1689–1696
16 Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the International Conference on Learning Representations. 2016
17 Gong Y, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. 2014, arXiv preprint arXiv: 1412.6115
18 Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 2285–2294
19 Hinton G, Oriol V, Jeff D. Distilling the knowledge in a neural network. In: Proceedings of the Neural Information Processing Systems Workshop. 2014
20 Hsieh C-J, Chang K-W, Lin C-J, Keerthi S S, Sundararajan S. A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 408–415
https://doi.org/10.1145/1390156.1390208
21 Yuan G X, Ho C H, Lin C J. Recent advances of large-scale linear classification. Proceedings of the IEEE, 2012, 100(9): 2584–2603
https://doi.org/10.1109/JPROC.2012.2188013
22 Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming, 2011, 127(1): 3–30
https://doi.org/10.1007/s10107-010-0420-4
23 Wu J. Power mean SVM for large scale visual classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2344–2351
24 Williams CSeeger M. Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems. 2001, 682–688
25 Djuric N, Lan L, Vucetic S, Wang Z. BudgetedSVM: a toolbox for scalable SVM approximations. The Journal of Machine Learning Research, 2013, 14(1): 3813–3817
26 Odone F, Barla A, Verri A. Building kernels from binary strings for image matching. IEEE Transactions on Image Processing, 2005, 14(2): 169–180
https://doi.org/10.1109/TIP.2004.840701
27 Wu J. A fast dual method for HIK SVM learning. In: Proceedings of the 11th European Conference on Computer Vision. 2010, 552–565
https://doi.org/10.1007/978-3-642-15552-9_40
28 Zhang Y, Wu J, Cai J, Lin W. Flexible image similarity computation using hyper-spatial matching. IEEE Transactions on Image Processing, 2014, 23(9): 4112–4125
https://doi.org/10.1109/TIP.2014.2344296
29 Deshpande A, Rademacher L, Vempala S, Wang G. Matrix approximation and projective clustering via volume sampling. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. 2006, 1117–1126
https://doi.org/10.1145/1109557.1109681
30 Zhang K, Tsang I W, Kwok J T. Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 1232–1239
https://doi.org/10.1145/1390156.1390311
31 Kumar S, Mohri M, Talwalkar A. Sampling methods for the Nyström method. The Journal ofMachine Learning Research, 2012, 13(1): 981–1006
32 Yang H, Wu J. Reduced heteroscedasticity linear regression for Nyström approximation. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1841–1847
33 Jiao L, Bo L, Wang L. Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks, 2007, 18(3): 685–697
https://doi.org/10.1109/TNN.2006.889500
34 Li F, Lebanon G, Sminchisescu C. Chebyshev approximations to the histogram χ2 kernel. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2424–2431
35 Rahimi ARecht B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007, 20: 1177–1184
36 Maji S, Berg A C. Max-margin additive classifiers for detection. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 40–47
https://doi.org/10.1109/iccv.2009.5459203
37 Csurka G, Dance C, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 2004, 1–16
38 Chang C, Lin C. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
https://doi.org/10.1145/1961189.1961199
39 Wu J. Efficient HIK SVM learning for image classification. IEEE Transactions on Image Processing, 2012, 21(10): 4442–4453
https://doi.org/10.1109/TIP.2012.2207392
40 Wu J, Yang H. Practical large scale classification with additive kernels. In: Proceedings of the Asian Conference on Machine Learning. 2012, 523–538
41 Bay H, Tuytelaars T, Van Gool L. Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision. 2006, 404–417
https://doi.org/10.1007/11744023_32
42 Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 2564–2571
https://doi.org/10.1109/iccv.2011.6126544
43 Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1470–1477
https://doi.org/10.1109/ICCV.2003.1238663
44 Winn J, Criminisi A, Minka T. Object categorization by learned universal visual dictionary. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 1800–1807
https://doi.org/10.1109/iccv.2005.171
45 Perronnin F. Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7): 1243–1256
https://doi.org/10.1109/TPAMI.2007.70755
46 Wu J, Tan W C, Rehg J M. Efficient and effective visual codebook generation using additive kernels. The Journal of Machine Learning Research, 2011, 12: 3097–3118
47 Schölkopf B, Platt J C, Shawe-Taylor J, Smola A J, Williamson R C. Estimating the support of a high-dimensional distribution. Neural computation, 2001, 13(7): 1443–1471
https://doi.org/10.1162/089976601750264965
48 Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the International Conference on Computer Vision Theory and Applications. 2009, 331–340
49 Liu T, Moore A W, Yang K. An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the Neural Information Processing Systems Conference. 2005, 825–832
50 Zhang Y, Wu J X, Lin W Y. Exclusive visual descriptor quantization. In: Proceedings of the 11th Asian Conference on Computer Vision. 2012, 408–421
51 Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using randomized clustering forests. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 2006, 985–992
52 Moosmann F, Nowak E, Jurie F. Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(9): 1632–1646
https://doi.org/10.1109/TPAMI.2007.70822
53 Binder A, Wojcikiewicz W, Müller C, Kawanabe M. A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition. In: Proceedings of the 10th Asian Conference on Computer Vision. 2010, 95–108
54 Uijlings J R R, Smeulders A W M, Scha R J. Real-time visual concept classification. IEEE Transactions on Multimedia, 2010, 12(7): 665–681
https://doi.org/10.1109/TMM.2010.2052027
55 Zabih R, Woodfill J. Non-parametric local transforms for computing visual correspondence. In: Proceedings of the 3rd European Conference on Computer Vision. 1994, 151–158
https://doi.org/10.1007/bfb0028345
56 Wu J, Rehg J M. Where am I: place instance and category recognition using spatial PACT. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
57 Xiao Y, Wu J, Yuan J. mCENTRIST: a multi-channel feature generation mechanism for scene categorization. IEEE Transactions on Image Processing, 2014, 23(2): 823–836
https://doi.org/10.1109/TIP.2013.2295756
58 Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 117–128
https://doi.org/10.1109/TPAMI.2010.57
59 Norouzi M, Fleet D J. Cartesian k-means. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3017–3024
https://doi.org/10.1109/cvpr.2013.388
60 Ge T, He K, Ke Q, Sun J. Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2946–2953
https://doi.org/10.1109/cvpr.2013.379
61 Gong Y, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes. In: IEEE Conference on Computer Vision and Pattern Recognition. 2011, 817–824
https://doi.org/10.1109/cvpr.2011.5995432
62 Gong Y, Kumar S, Rowley H A, Lazebnik S. Learning binary codes for high-dimensional data using bilinear projections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 484–491
https://doi.org/10.1109/cvpr.2013.69
63 Schwartz W R, Kembhavi A. Human detection using partial least squares analysis. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 24–31
https://doi.org/10.1109/iccv.2009.5459205
64 Zhang Y, Wu J, Cai J. Compact representation for image classification: to choose or to compress? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 907–914
https://doi.org/10.1109/cvpr.2014.121
65 Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238
https://doi.org/10.1109/TPAMI.2005.159
66 Fleuret F. Fast binary feature selection with conditional mutual information. The Journal of Machine Learning Research, 2004, 5: 1531–1555
67 Sindhwani V, Sainath T N, Kumar S. Structured transforms for smallfootprint deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 3070–3078
68 Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Neural Information Processing Systems Conference. 2014, 1269–1277
69 Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. 2014, arXiv preprint arXiv: 1405.3866
70 Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th ACM International Conference on Machine Learning. 2007, 473–480
https://doi.org/10.1145/1273496.1273556
71 Denil M, Shakibi B, Dinh L. Predicting parameters in deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2013, 2148–2156
72 Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 1135–1143
73 Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. 2012, arXiv preprint arXiv: 1207.0580
74 Luo P, Zhu Z, Liu Z, Wang X, Tang X. Face model compression by distilling knowledge from neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016
75 Ba J, Caruana R. Do deep nets really need to be deep? In: Proceedings of the Neural Information Processing Systems Conference. 2014, 2654–2662
76 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
https://doi.org/10.1109/cvpr.2015.7298594
77 Arora S, Bhaskara A, Ge R, Ma T. Provable bounds for learning some deep representations. 2013, arXiv preprint arXiv: 1310.6343
78 Iandola F N, Moskewicz M W, Ashraf K, Han S, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<1MB model size. 2016, arXiv preprint arXiv: 1602.07360
[1]  Supplementary Material Download
[1] Chunping LIU, Yang ZHENG, Shengrong GONG. Image categorization using a semantic hierarchy model with sparse set of salient regions[J]. Front Comput Sci, 2013, 7(6): 838-851.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed