Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2020, Vol. 14 Issue (4): 144701   https://doi.org/10.1007/s11704-019-8266-2
  本期目录
Multipath affinage stacked—hourglass networks for human pose estimation
Guoguang HUA1, Lihong LI1, Shiguang LIU2()
1. School of Information and Electrical Engineering, Hebei University of Engineering, Handan 056038, China
2. School of Computer Science and Technology, Division of Intelligence and Computing, Tianjin University, Tianjin 300350, China
 全文: PDF(1736 KB)  
Abstract

Recently, stacked hourglass network has shown outstanding performance in human pose estimation. However, repeated bottom-up and top-down stride convolution operations in deep convolutional neural networks lead to a significant decrease in the initial image resolution. In order to address this problem, we propose to incorporate affinage module and residual attention module into stacked hourglass network for human pose estimation. This paper introduces a novel network architecture to replace the stacked hourglass network of up-sampling operation for getting high-resolution features. We refer to the architecture as an affinage module which is critical to improve the performance of the stacked hourglass network. Additionally, we also propose a novel residual attention module to increase the supervision of upsample process. The effectiveness of the introduced module is evaluated on standard benchmarks. Various experimental results demonstrated that our method can achieve more accurate and more robust human pose estimation results in images with complex background.

Key wordshuman pose estimation    stacked hourglass network    affinage module    residual attention module
收稿日期: 2018-08-01      出版日期: 2020-03-11
Corresponding Author(s): Shiguang LIU   
 引用本文:   
. [J]. Frontiers of Computer Science, 2020, 14(4): 144701.
Guoguang HUA, Lihong LI, Shiguang LIU. Multipath affinage stacked—hourglass networks for human pose estimation. Front. Comput. Sci., 2020, 14(4): 144701.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-019-8266-2
https://academic.hep.com.cn/fcs/CN/Y2020/V14/I4/144701
1 K Chen, G Ding, J Han. Attribute-based supervised deep learning model for action recognition. Frontiers of Computer Science, 2017, 11(2): 219–229
https://doi.org/10.1007/s11704-016-6066-5
2 R R Varior, B Shuai, J Lu. A siamese long short-term memory architecture for human re-identification. In: Proceedings of European Conference on Computer Vision. 2016, 135–153
https://doi.org/10.1007/978-3-319-46478-7_9
3 B Sapp, B Taskar. MODEC: multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3674–3681
https://doi.org/10.1109/CVPR.2013.471
4 P Felzenszwalb, D Mcallester, D Ramanan. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008
https://doi.org/10.1109/CVPR.2008.4587597
5 L Pishchulin, M Andriluka, P Gehler. Strong appearance and expressive spatial models for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2014, 3487–3494
https://doi.org/10.1109/ICCV.2013.433
6 S Johnson, M Everingham. Learning effective human pose estimation from inaccurate annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1465–1472
https://doi.org/10.1109/CVPR.2011.5995318
7 W Ouyang, X Chu, X. WangMulti-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2329–2336
https://doi.org/10.1109/CVPR.2014.299
8 L Ladicky, P H S Torr, A. ZissermanHuman pose estimation using a joint pixel-wise and part-wise formulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3578–3585
https://doi.org/10.1109/CVPR.2013.459
9 S G Liu, Y Li, G Hua. Human pose estimation in video via structured space learning and halfway temporal evaluation. IEEE Transactions on Circuits and Systems for Video Technology. 2018, 1
10 A Krizhevsky, I Sutskever, G E Hinton. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1097–1105
11 S, Ioffe C Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning. 2015, 448–456
12 C Szegedy, W Liu, Y Jia. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
https://doi.org/10.1109/CVPR.2015.7298594
13 Y, Li S G Liu. Temporal-coherency-aware human pose estimation in video via pre-trained res-net and flow-CNN. In: Proceedings of International Conference on Computer Animation and Social Agents. 2017, 150–159
14 S Johnson, M Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference. 2010, 1–11
https://doi.org/10.5244/C.24.12
15 M Andriluka, L Pishchulin, P Gehler. 2D Human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3686–3693
https://doi.org/10.1109/CVPR.2014.471
16 A Newell, K Yang, J Deng. Stacked hourglass networks for human pose estimation. In: Proceedings of European Conference on Computer Vision. 2016, 483–499
https://doi.org/10.1007/978-3-319-46484-8_29
17 J Long, E Shelhamer, T Darrell. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431–3440
https://doi.org/10.1109/CVPR.2015.7298965
18 M Andriluka, S, Roth B Schiele. Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1014–1021
https://doi.org/10.1109/CVPR.2009.5206754
19 M Andriluka, S Roth, B Schiele. Monocular 3D pose estimation and tracking by detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 623–630
https://doi.org/10.1109/CVPR.2010.5540156
20 Q Lopez, I Manuel. Mixing body-parts model for 2D human pose estimation in stereo videos. IET Computer Vision, 2017, 11(6): 426–433
https://doi.org/10.1049/iet-cvi.2016.0249
21 N, Dalal B Triggs. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886–893
22 E Dogan, G Eren, C. WolfMulti-view pose estimation with mixturesof- parts and adaptive viewpoint selection. IET Computer Vision, 2018, 12(4): 403–411
https://doi.org/10.1049/iet-cvi.2017.0146
23 A Toshev, C Szegedy. DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 1653–1660
https://doi.org/10.1109/CVPR.2014.214
24 J Tompson, R Goroshin, A Jain. Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 648–656
https://doi.org/10.1109/CVPR.2015.7298664
25 J Tompson, A Jain, Y LeCun. Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of the 28th Annual Conference on Neural Information Processing Systems. 2014, 1799–1807
26 J Carreira, P, Agrawal K Fragkiadaki. Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4733–4742
https://doi.org/10.1109/CVPR.2016.512
27 S E Wei, V Ramakrishna, T Kanade. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4724–4732
https://doi.org/10.1109/CVPR.2016.511
28 Z, Cao T Simon, W ShihEn. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1302–1310
https://doi.org/10.1109/CVPR.2017.143
29 H Noh, S Hong, B Han. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1520–1528
https://doi.org/10.1109/ICCV.2015.178
30 K Rematas, T Ritschel, M Fritz. Deep reflectance maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4508–4516
https://doi.org/10.1109/CVPR.2016.488
31 K M He, X, Zhang S Ren. Deep residual learning for image recogni tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
https://doi.org/10.1109/CVPR.2016.90
32 M Jaderberg, K Simonyan, A Zisserman. Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2017–2025
33 V Ferrari, M Marin, A Zisserman. Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
https://doi.org/10.1109/CVPR.2008.4587468
34 W Yang, S, Li W Ouyang. Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1281–1290
https://doi.org/10.1109/ICCV.2017.144
35 Y Yang, D Ramanan. Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2878–2890
https://doi.org/10.1109/TPAMI.2012.261
36 X Yu, F Zhou, M Chandraker. Deep deformation network for object landmark localization. In: Proceedings of European Conference on Computer Vision. 2016, 52–70
https://doi.org/10.1007/978-3-319-46454-1_4
37 V Belagiannis, A Zisserman. Recurrent human pose estimation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition. 2017, 468–475
https://doi.org/10.1109/FG.2017.64
38 I, Lifshitz E Fetaya, S Ullman. Human pose estimation using deep consensus voting. In: Proceedings of European Conference on Computer Vision. 2016, 246–260
https://doi.org/10.1007/978-3-319-46475-6_16
39 L Pishchulin, E Insafutdinov, S Tang. Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4929–4937
https://doi.org/10.1109/CVPR.2016.533
40 E Insafutdinov, L Pishchulin, B Andres. Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 34–50
https://doi.org/10.1007/978-3-319-46466-4_3
41 P, Hu D Ramanan. Bottom-up and top-down reasoning with hierarchical rectified gaussians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5600–5609
https://doi.org/10.1109/CVPR.2016.604
[1] FCS-0012-18266-GGH_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed