Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng    2012, Vol. 7 Issue (2) : 224-241    https://doi.org/10.1007/s11460-011-0175-6
RESEARCH ARTICLE
Action recognition from arbitrary views using 3D-key-pose set
Junxia GU, Xiaoqing DING(), Shenjing WANG
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
 Download: PDF(1249 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Recovering three-dimensional (3D) human pose sequence from arbitrary view is very difficult, due to loss of depth information and self-occlusion. In this paper, view-independent 3D-key-pose set is selected from 3D action samples, for the purpose of representing and recognizing those same actions from a single or few cameras without any restriction of the relative orientations between cameras and subjects. First, 3D-key-pose set is selected from the 3D human joint sequences of 3D training action samples that are built from multiple viewpoints. Second, 3D key pose sequence, which matches best with the observation sequence, is selected from the 3D-key-pose set to represent the observation sequence of arbitrary view. 3D key pose sequence contains many discriminative view-independent key poses but cannot accurately describe pose of every frame in the observation sequence. Considering the above reasons, pose and dynamic of action are modeled respectively in this paper. Exemplar-based embedding and probability of unique key pose are applied to model pose property. Complementary dynamic feature is extracted to model these actions that share the same poses but have different dynamic features. Finally, these action models are fused to recognize observation sequence from a single or few cameras. Effectiveness of the proposed approach is demonstrated with experiments on IXMAS dataset.

Keywords action representation      action recognition      3D-key-pose set      3D key pose sequence      action models fusion     
Corresponding Author(s): DING Xiaoqing,Email:dingxq@tsinghua.edu.cn   
Issue Date: 05 June 2012
 Cite this article:   
Junxia GU,Xiaoqing DING,Shenjing WANG. Action recognition from arbitrary views using 3D-key-pose set[J]. Front Elect Electr Eng, 2012, 7(2): 224-241.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-011-0175-6
https://academic.hep.com.cn/fee/EN/Y2012/V7/I2/224
Fig.1  Flow chart of the proposed action recognition approach using 3D-key-pose set
Fig.2  Key poses in action sequence
Fig.3  Single pose in action sequence
Fig.4  Example of 3D key pose
Select 3D-key-pose set: Pkey={Pkey1,?,Pkeyk,?,PkeyNkey}
Step 1Set 3D-key-pose set Pkey=?;
Step 2Find y*ζ and y*?Pkey, where exemplar-based embedding classifier using exemplar set {Pkey{y*}} has the best recognition performance in training set; if multiple y* are with same performance, we will randomly pick one. Pkey={Pkey{y*}};
Step 3Repeat Step 2 until Nkey key poses are selected or the recognition rate of training set converges to a stable value.
Tab.1  3D-key-pose set selection algorithm
Fig.5  First 24 3D key poses as returned by the algorithm in Table 1 (only the 3D point sets of subject are shown)
Fig.6  Sketch map of 3D key pose sequence extraction from 3D-key-pose set
Fig.7  Projection images of one 3D key pose
Fig.8  Flow chart of normalization of silhouette image
Step 1Translate point set S={(sx(n),sy(n))}n=1NS to the origin point to obtain S1={(sx1(n),sy1(n))}n=1NS, where sx1(n)=sx(n)-min?n=1Ns{sx(n)}, sy1(n)=sy(n)-min?n=1NS{sy(n)};
Step 2Zoom height of subject to 60 pixels to obtain S2={(sx2(n),sy2(n))}n=1NS, where sx2(n)=z×sx1(n), sy2(n)=z×sy1(n), and z is the zoom parameter;
Step 3Normalize the silhouette image to 80×60 pixels. Let Sˉ={(sˉx1(n),sˉy1(n))}n=1NS be the normalized silhouette image. Then, sˉx(n)=sx2(n)+30, s ˉy(n)=sy2(n)-max?n=1NI{sy2(n)}+70.
Tab.2  Normalization method of silhouette image
Fig.9  Undirected Chamfer distance of silhouette images and body type parameter
Step 1Calculate the undirected Chamfer distances between projection image, pmk, and projection images of other 3D key poses: D(pmk,pni)=dˉC(pmk,pni)(1k,iNkey, 1m,nM, ik);
Step 2Obtain weight-related matrix: DD(k,m)=min?1iNkey,ik1nMD(pmk,pni), where DD(k,m) is the smallest distance between pmk and projection images of other 3D key poses;
Step 3Normalize weight-related matrix: DD ˉ(k,m)=DD(k,m)/max?1iNkey1nM DD(i,n);
Step 4Normalize the weight-related values of the same 3D key pose: DD(k) ˉ(k,m)=DD ˉ(k,m)-max?1mMDD ˉ(k,m)+1(1kNkey);
Step 5Obtain confidence weight: ωmk=1+exp?[α×DD(k) ˉ(k,m)](α>0).
Tab.3  Confidence weight calculation
Fig.10  Examples of confidence weights
Fig.11  Flow chart of action recognition
Fig.12  Feature extraction sketch of exemplar-based embedding
Fig.13  Samples of unique key poses
Fig.14  Samples of common key poses
Fig.15  Samples of images and 3D volume data in IXMAS dataset
Fig.16  Twelve subjects in IXMAS dataset
Fig.17  Images simultaneously captured from four cameras
Fig.18  Recognition rate versus count of 3D key poses
Fig.19  Examples of 3D key pose sequences (single camera)
Fig.20  Examples of 3D key pose (single camera and multiple cameras)
Fig.21  Examples of 3D key pose sequences (four-camera fusion)
Fig.22  Effect of confidence weight (four-camera fusion)
Fig.23  Recognition rates per camera
camera countcamera labelrecognition rate
1cam 178%
cam 278%
cam 367%
cam 478%
average recognition rate75%
2cam 1, cam 284%
cam 1, cam 382%
cam 1, cam 487%
cam 2, cam 380%
cam 2, cam 488%
cam 3, cam 480%
average recognition rate83%
3cam 1, cam 2, cam 387%
cam 1, cam 2, cam 488%
cam 1, cam 3, cam 488%
cam 2, cam 3, cam 485%
average recognition rate87%
4cam 1, cam 2, cam 3, cam 493%
Tab.4  Recognition rates versus camera count
Algorithmsaction countsubject countdatabaserepresentationcamera countrecognition rates
Wang et al. [21]109WeizmannMMS & AME196.7%
Gorelick et al. [22]109Weizmannspace-time shapes197.8%
Weinland et al. [19]109Weizmannsilhouette1100%
Gu et al. [12]1112IXMAS3D human joint sequence3D94.4%
Weinland et al. [6]1110IXMASMHV template3D93.3%
Lv et al. [7]22MoCap3D human joint sequence3D92.1%
Davis et al. [23]181MEI & MHI283.3%
Ahmad et al. [16]711KUGDBoptic flow and shape flow388.3%
Natarajan et al. [17]64optic flow and shape flow178.85%
Weinland et al. [8]1110IXMASsilhouette481.3%
Yan et al. [9]1112IXMASSTVs478%
Liu et al. [10]1312IXMASSTVs & spin-images478.5%
our approach1112IXMAS3D key pose sequence493%
Tab.5  Comparison of the proposed approach with previous researches
cam 1cam 2cam 3cam 4cam 2, cam 4cam 1, cam 2 cam 3cam 1, cam 2 cam 3, cam 4
Weinland et al. [8]65.4%70.0%54.3%66.0%81.3%81.3%
Yan et al. [9]72%53%68%63%71%60%78%
Liu et al. [10]73.46%72.74%69.62%70.94%78.5%
our approach78%78%67%78%88%87%93%
Tab.6  Comparison of the proposed approach with previous researches (IXMAS dataset)
1 Yilmaz A, Shah M. Matching actions in presence of camera motion. Computer Vision and Image Understanding , 2006, 104(2-3): 221-231
doi: 10.1016/j.cviu.2006.07.012
2 Poppe R. Vision-based human motion analysis: An overview. Computer Vision and Image Understanding , 2007, 108(1-2): 4-18
doi: 10.1016/j.cviu.2006.10.016
3 Shen Y, Ashraf N, Foroosh H. Action recognition based on homography constraints. In: Proceedings of International Conference on Pattern Recognition . 2008, 1-4
4 Souvenir R, Babbs J. Learning the viewpoint manifold for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008, 1-7
5 Ahmad M, Lee S. HMM-based human action recognition using multiview image sequences. In: Proceedings of International Conference on Pattern Recognition . 2006, 1: 263-266
6 Weinland D, Ronfard R, Boyer E. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding , 2006, 104(2): 249-257
doi: 10.1016/j.cviu.2006.07.013
7 Lv F, Nevatia R. Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost. In: Proceedings of European Conference on Computer Vision . 2006, 359-372
8 Weinland D, Boyer E, Ronfard R. Action recognition from arbitrary views using 3D exemplars. In: Proceedings of IEEE International Conference on Computer Vision . 2007, 1-7
9 Yan P, Khan S M, Shah M.Learning 4D action feature models for arbitrary view action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008, 1-7
10 Liu J, Ali S, Shah M. Recognizing human actions using multiple features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008, 1-8
11 Johansson G. Visual motion perception. Scientific American , 1975, 232(6): 76-88
doi: 10.1038/scientificamerican0675-76 pmid:1145169
12 Gu J, Ding X, Wang S, Wu Y. Action and gait recognition from recovered 3-D human joints. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , 2010, 40(4): 1021-1033
13 Cheung K M G, Baker S, Kanade T. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2003, 1: 77-84
14 Parameswaran V, Chellappa R. View invariance for human action recognition. International Journal of Computer Vision , 2006, 66(1): 83-101
doi: 10.1007/s11263-005-3671-4
15 Gritai A, Sheikh Y, Shah M. On the use of anthropometry in the invariant analysis of human actions. In: Proceedings of International Conference on Pattern Recognition . 2004, 2: 923-926
16 Ahmad M, Lee S W. Human action recognition using shape and CLG-motion flow from multi-view image sequences. Pattern Recognition , 2008, 41(7): 2237-2252
doi: 10.1016/j.patcog.2007.12.008
17 Natarajan P, Nevatia R. View and scale invariant action recognition using multiview shape-flow models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008, 1-8
18 Lv F, Nevatia R. Single view human action recognition using key pose matching and Viterbi path searching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2007, 1-8
19 Weinland D, Boyer E. Action recognition using exemplar-based embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2008, 1-7
20 Rabiner L R. A tutorial on hidden Markov model and selected applications in speech recognition. Proceedings of the IEEE , 1989, 77(2): 257-286
doi: 10.1109/5.18626
21 Wang L, Suter D. Informative shape representations for human action recognition. In: Proceedings of International Conference on Pattern Recognition . 2006, 1266-1269
22 Gorelick L, Blank M, Shechtman E, Irani M, Basri R. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2007, 29(12): 2247-2253
doi: 10.1109/TPAMI.2007.70711 pmid:17934233
23 Davis J W, Bobick A F. The representation and recognition of human movement using temporal templates. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 1997, 928-934
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed