Please wait a minute...
Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184

Frontiers of Information Technology & Electronic Engineering  2015, Vol. 16 Issue (10): 817-828   https://doi.org/10.1631/FITEE.1500070
  本期目录
Beyond bag of latent topics: spatial pyramid matching for scene category recognition
Fu-xiang LU1,*(),Jun HUANG2,*()
1. School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
2. Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
 全文: PDF(658 KB)  
Abstract

We propose a heterogeneous, mid-level feature based method for recognizing natural scene categories. The proposed feature introduces spatial information among the latent topics by means of spatial pyramid, while the latent topics are obtained by using probabilistic latent semantic analysis (pLSA) based on the bag-of-words representation. The proposed feature always performs better than standard pLSA because the performance of pLSA is adversely affected in many cases due to the loss of spatial information. By combining various interest point detectors and local region descriptors used in the bag-of-words model, the proposed feature can make further improvement for diverse scene category recognition tasks. We also propose a two-stage framework for multi-class classification. In the first stage, for each of possible detector/descriptor pairs, adaptive boosting classifiers are employed to select the most discriminative topics and further compute posterior probabilities of an unknown image from those selected topics. The second stage uses the prod-max rule to combine information coming from multiple sources and assigns the unknown image to the scene category with the highest ‘final’ posterior probability. Experimental results on three benchmark scene datasets show that the proposed method exceeds most state-of-the-art methods.

Key wordsScene category recognition    Probabilistic latent semantic analysis    Bag-of-words    Adaptive boosting
收稿日期: 2015-03-07      出版日期: 2015-10-12
Corresponding Author(s): Fu-xiang LU,Jun HUANG   
 引用本文:   
. [J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(10): 817-828.
Fu-xiang LU,Jun HUANG. Beyond bag of latent topics: spatial pyramid matching for scene category recognition. Front. Inform. Technol. Electron. Eng, 2015, 16(10): 817-828.
 链接本文:  
https://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1500070
https://academic.hep.com.cn/fitee/CN/Y2015/V16/I10/817
1 Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1): 119−139. []
https://doi.org/10.1006/jcss.1997.1504
2 Harris, C., Stephens, M., 1988. A combined corner and edge detector. Alvey Vision Conf., p.147−151. []
https://doi.org/10.5244/C.2.23
3 Hofmann, T., 1999. Probabilistic latent semantic indexing. Proc. 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.50−57. []
https://doi.org/10.1145/312624.312649
4 Hu, Z.H., Cai, Y.Z., Li, Y.G., , 2005. Data fusion for fault diagnosis using multi-class support vector machines. J. Zhejiang Univ.-Sci., 6A(10): 1030−1039. []
https://doi.org/10.1631/jzus.2005.A1030
5 Kadir, T., Brady, M., 2001. Saliency, scale and image description. Int. J. Comput. Vis., 45(2): 83−105. []
https://doi.org/10.1023/A:1012460413855
6 Kwitt, R., Vasconcelos, N., Rasiwasia, N., 2012. Scene recognition on the semantic manifold. European Conf. on Computer Vision, p.359−372. []
https://doi.org/10.1007/978-3-642-33765-9_26
7 Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169−2178. []
https://doi.org/10.1109/CVPR.2006.68
8 Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524−531. []
https://doi.org/10.1109/CVPR.2005.16
9 Liu, J.G., Shah, M., 2007. Scene modeling using coclustering. IEEE Int. Conf. on Computer Vision, p.1−7. []
https://doi.org/10.1109/ICCV.2007.4408866
10 Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2): 91−110. []
https://doi.org/10.1023/B:VISI.0000029664.99615.94
11 Lu, F.X., Yang, X.K., Zhang, R., , 2009. Image classification based on pyramid histogram of topics. IEEE Int. Conf. on Multimedia and Expo, p.398−401. []
https://doi.org/10.1109/ICME.2009.5202518
12 Lu, F.X., Yang, X.K., Lin, W.Y., , 2011. Image classification with multiple feature channels. Opt. Eng., 50(5): 057210.1−057210.9. []
https://doi.org/10.1117/1.3582852
13 Matas, J., Chum, O., Urban, M., , 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput., 22(10): 761−767. []
https://doi.org/10.1016/j.imavis.2004.02.006
14 Mikolajczyk, K., Schmid, C., 2004. Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 60(1): 63−86. []
https://doi.org/10.1023/B:VISI.0000027790.02288.f2
15 Oliva, A., Torralba, A., 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis., 42(3): 145−175. []
https://doi.org/10.1023/A:1011139631724
16 Qi, X.B., Xiao, R., Li, C.G., , 2014. Pairwise rotation invariant co-occurrence local binary pattern. IEEE Trans. Patt. Anal. Mach. Intell., 36(11): 2199−2213. []
https://doi.org/10.1109/TPAMI.2014.2316826
17 Quelhas, P., Monay, F., Odobez, J., , 2007. A thousand words in a scene. IEEE Trans. Patt. Anal. Mach. Intell., 29(9): 1575−1589. []
https://doi.org/10.1109/ TPAMI.2007.1155
18 Shechtman, E., Irani, M., 2007. Matching local selfsimilarities across images and videos. IEEE Conf. on Computer Vision and Pattern Recognition, p.1−8. []
https://doi.org/10.1109/CVPR.2007.383198
19 Wang, Z.L., Feng, J.S., Yan, S.C., , 2013. Linear distance coding for image classification. IEEE Trans. Image Process., 22(2): 537−548. []
https://doi.org/10.1109/TIP.2012.2218826
20 Wu, J.X., 2012. Efficient HIK SVM learning for image classification. IEEE Trans. Image Process., 21(10): 4442−4453. []
https://doi.org/10.1109/TIP.2012.2207392
21 Wu, J.X., Rehg, J.M., 2011. CENTRIST: a visual descriptor for scene categorization. IEEE Trans. Patt. Anal. Mach. Intell., 33(8): 1489−1501. []
https://doi.org/10.1109/TPAMI.2010.224
22 Zhang, J.G., Marszałek, M., Lazebnik, S., , 2006. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2): 213−238. []
https://doi.org/10.1007/s11263-006-9794-4
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed