Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (2) : 219-229    https://doi.org/10.1007/s11704-016-6066-5
RESEARCH ARTICLE
Attribute-based supervised deep learning model for action recognition
Kai CHEN1,Guiguang DING1(),Jungong HAN2
1. School of Software, Tsinghua University, Beijing 100084, China
2. Department of Computer Science, Northumbria University, Newcastle NE1 8ST, UK
 Download: PDF(413 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Deep learning has been the most popular feature learning method used for a variety of computer vision applications in the past 3 years. Not surprisingly, this technique, especially the convolutional neural networks (ConvNets) structure, is exploited to identify the human actions, achieving great success. Most algorithms in existence directly adopt the basic ConvNets structure, which works pretty well in the ideal situation, e.g., under stable lighting conditions. However, its performance degrades significantly when the intra-variation in relation to image appearance occurs within the same category. To solve this problem, we propose a new method, integrating the semantically meaningful attributes into deep learning’s hierarchical structure. Basically, the idea is to add simple yet effective attributes to the category level of ConvNets such that the attribute information is able to drive the learning procedure. The experimental results based on three popular action recognition databases show that the embedding of auxiliary multiple attributes into the deep learning framework improves the classification accuracy significantly.

Keywords action recognition      convolutional neural network      attribute     
Corresponding Author(s): Guiguang DING   
Just Accepted Date: 23 December 2016   Online First Date: 17 March 2017    Issue Date: 06 April 2017
 Cite this article:   
Kai CHEN,Guiguang DING,Jungong HAN. Attribute-based supervised deep learning model for action recognition[J]. Front. Comput. Sci., 2017, 11(2): 219-229.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6066-5
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I2/219
1 Lao W L, Han J G. Automatic video-based human motion analyzer for consumer surveillance system. IEEE Transactions on Consumer Electronics, 2009, 55(2): 591–598
https://doi.org/10.1109/TCE.2009.5174427
2 Zhang B C, Alessandro P, Li Z G, Vittorio M, Liu J Z, Ji R R. Bounding multiple gaussians uncertainty with application to object tracking. International Journal of Computer Vision, 2016, 1–16
https://doi.org/10.1016/j.jvlc.2016.02.001
3 Chen C, Liu M Y, Zhang B C, Han J G, Jiang J J, Liu H. 3D action recognition using multi-temporal depth motion maps and fisher vector. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 3331–3337
4 Han J G, Dirk F, De With P H N. Broadcast court-net sports video analysis using fast 3-D camera modeling. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1628–1638
https://doi.org/10.1109/TCSVT.2008.2005611
5 Ding G G, Guo Y C, Zhou J L, Gao Y. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing, 2016, 25(11): 5427–5440
https://doi.org/10.1109/TIP.2016.2607421
6 Lin Z J, Ding G G, Han J G, Wang J M. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Transactions on Cybernetics, 2016
https://doi.org/10.1109/TCYB.2016.2608906
7 Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
https://doi.org/10.1109/cvpr.2005.177
8 Laptev I, Marszałek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
https://doi.org/10.1109/cvpr.2008.4587756
9 Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance. In: Proceedings of European Conference on Computer Vision. 2006, 428–441
https://doi.org/10.1007/11744047_33
10 Wang H, Schmid C. Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 3551–3558
https://doi.org/10.1109/iccv.2013.441
11 Li F F, Pietro P. A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 524–531
12 Lee H, Battle A, Raina R, Ng A Y. Efficient sparse coding algorithms. In: Proceedings of Advances in Neural Information Processing Systems. 2006, 801–808
13 Yang Y, Wang X, Liu Q, Xu M L, Yu L. A bundled-optimization model of multiview dense depth map synthesis for dynamic scene reconstruction. Information Sciences, 2015, 320: 306–319
https://doi.org/10.1016/j.ins.2014.11.014
14 Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems. 2012, 1097–1105
15 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F F. Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 1725–1732
https://doi.org/10.1109/cvpr.2014.223
16 Price A L, Patterson N J, Plenge R M, Weinblatt M E, Shadick N A, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 2006, 38(8): 904–909
https://doi.org/10.1038/ng1847
17 Liu A A, Su Y T, Jia P P, Gao Z, Hao T, Yang Z X. Multipe/singleview human action recognition via part-induced multitask structural learning. IEEE Transactions on Cybernetics, 2015, 45(6): 1194–1208
https://doi.org/10.1109/TCYB.2014.2347057
18 Liu A A, Xu N, Su Y T, Lin H, Hao T, Yang Z X. Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing, 2015, 151: 544–553
https://doi.org/10.1016/j.neucom.2014.04.090
19 Xu N, Liu A A, Nie W Z, Wong Y Y, Li F W, Su Y T. Multi-modal & multi-view & interactive benchmark dataset for human action recognition. In: Proceedings of the 23rd ACM International Conference on Multimedia. 2015, 1195–1198
https://doi.org/10.1145/2733373.2806315
20 Liu A A, Nie W Z, Su Y T, Ma L, Hao T, Yang Z X. Coupled hidden conditional random fields for RGB-D human action recognition. Signal Processing, 2015, 112: 74–82
https://doi.org/10.1016/j.sigpro.2014.08.038
21 Yang Y, Wang X, Guan T, Shen J L, Yu L. A multi-dimensional image quality prediction model for user-generated images in social networks. Information Sciences, 2014, 281: 601–610
https://doi.org/10.1016/j.ins.2014.03.016
22 Zhu Y M, Li K, Jiang J M. Video super-resolution based on automatic key-frame selection and feature-guided variational optical flow. Signal Processing: Image Communication, 2014, 29(8): 875–886
https://doi.org/10.1016/j.image.2014.06.005
23 Gao Y, Wang M, Tao D C, Ji R R, Dai Q H. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing, 2012, 21(9): 4290–4303
https://doi.org/10.1109/TIP.2012.2199502
24 Gao Y, Wang M, Ji R R, Wu X D, Dai Q H. 3-D object retrieval with hausdorff distance learning. IEEE Transactions on Industrial Electronics, 2014, 61(4): 2088–2098
https://doi.org/10.1109/TIE.2013.2262760
25 Ji R R, Gao Y, Hong R C, Liu Q, Tao D C, Li X L. Spectral-spatial constraint hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(3): 1811–1824
https://doi.org/10.1109/TGRS.2013.2255297
26 Lu X Q, Zheng X T, Li X L. Latent semantic minimal hashing for image retrieval. IEEE Transactions on Image Processing, 2016, 26(1): 355–368
https://doi.org/10.1109/TIP.2016.2627801
27 Lu X Q, Li X L, Mou L C. Semi-supervised multitask learning for scene recognition. IEEE Transactions on Cybernetics, 2015, 45(9): 1967–1976
https://doi.org/10.1109/TCYB.2014.2362959
28 Zhang D W, Han J W, Han J G, Shao L. Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(6): 1163–1176
https://doi.org/10.1109/TNNLS.2015.2495161
29 Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 568–576
30 Ryoo M S, Rothrock B, Matthies L. Pooled motion features for firstperson videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 896–904
31 Wang L M, Qiao Y, Tang X O. Action recognition with trajectorypooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4305–4314
32 Liu J G, Yu Q, Javed O, Ali S, Tamrakar A, Divakaran A, Cheng H, Sawhney H. Video event recognition using concept attributes. In: Proceedings of IEEE Workshop on Applications of Computer Vision. 2013, 339–346
https://doi.org/10.1109/wacv.2013.6475038
33 Soomro K, Zamir A R, Shah M. Ucf101: a dataset of 101 human actions classes from videos in the wild. 2012, arXiv preprint arXiv:1212.0402
34 Deng J, Dong W, Socher R, Li L J, Li K, Li F F. Imagenet: A largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
35 Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. 2014, 675–678
https://doi.org/10.1145/2647868.2654889
36 Wang H, Kläser A, Schmid C, Liu C L. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 2013, 103(1): 60–79
https://doi.org/10.1007/s11263-012-0594-8
37 Ng J Y H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G. Beyond short snippets: deep networks for video classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4694–4702
38 Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition. 2004, 32–36
https://doi.org/10.1109/icpr.2004.1334462
39 Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T. Hmdb: a large video database for human motion recognition. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 2556–2563
https://doi.org/10.1109/iccv.2011.6126543
40 Chang C C, Lin C J. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
https://doi.org/10.1145/1961189.1961199
41 Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S. Dynamic image networks for action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2016
https://doi.org/10.1109/cvpr.2016.331
42 Bagheri M, Gao Q G, Escalera S, Clapes A, Nasrollahi K, Holte M, Moeslund T. Keep it accurate and diverse: enhancing action recognition performance by ensemble learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2015, 22–29
https://doi.org/10.1109/cvprw.2015.7301332
43 Ho T K. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832–844
https://doi.org/10.1109/34.709601
[1] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[2] Qianchen YU, Zhiwen YU, Zhu WANG, Xiaofeng WANG, Yongzhi WANG. Estimating posterior inference quality of the relational infinite latent feature model for overlapping community detection[J]. Front. Comput. Sci., 2020, 14(6): 146323-.
[3] Lydia LAZIB, Bing QIN, Yanyan ZHAO, Weinan ZHANG, Ting LIU. A syntactic path-based hybrid neural network for negation scope detection[J]. Front. Comput. Sci., 2020, 14(1): 84-94.
[4] Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.
[5] Anna ZHU, Seiichi UCHIDA. Scene word recognition from pieces to whole[J]. Front. Comput. Sci., 2019, 13(2): 292-301.
[6] Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU. Robust feature learning for online discriminative tracking without large-scale pre-training[J]. Front. Comput. Sci., 2018, 12(6): 1160-1172.
[7] Qianjun ZHANG, Lei ZHANG. Convolutional adaptive denoising autoencoders for hierarchical feature extraction[J]. Front. Comput. Sci., 2018, 12(6): 1140-1148.
[8] Lili HUANG, Jiefeng PENG, Ruimao ZHANG, Guanbin LI, Liang LIN. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Front. Comput. Sci., 2018, 12(5): 840-857.
[9] Feifei ZHANG,Yongbin YU,Qirong MAO,Jianping GOU,Yongzhao ZHAN. Pose-robust feature learning for facial expression recognition[J]. Front. Comput. Sci., 2016, 10(5): 832-844.
[10] Yi ZHENG,Qi LIU,Enhong CHEN,Yong GE,J. Leon ZHAO. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification[J]. Front. Comput. Sci., 2016, 10(1): 96-112.
[11] Franco RONCHETTI,Facundo QUIROGA,Laura LANZARINI,Cesar ESTREBOU. Distribution of action movements (DAM): a descriptor for human action recognition[J]. Front. Comput. Sci., 2015, 9(6): 956-965.
[12] Qingliang MIAO,Qiudan LI,Daniel ZENG,Yao MENG,Shu ZHANG,Hao YU. Entity attribute discovery and clustering from online reviews[J]. Front. Comput. Sci., 2014, 8(2): 279-288.
[13] Jiangtao WANG, Debao CHEN, Jingyu YANG, . Human behavior classification by analyzing periodic motions[J]. Front. Comput. Sci., 2010, 4(4): 580-588.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed