Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (1) : 171304    https://doi.org/10.1007/s11704-021-1207-x
RESEARCH ARTICLE
Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet
Haoyu ZHAO1, Weidong MIN2,3(), Jianqiang XU1, Qi WANG1, Yi ZOU1, Qiyan FU1
1. School of Information Engineering, Nanchang University, Nanchang 330031, China
2. School of Software, Nanchang University, Nanchang 330047, China
3. Jiangxi Key Laboratory of Smart City, Nanchang 330047, China
 Download: PDF(6952 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.

Keywords crowd counting      meta-learning      scene-adaptive      Dual-illumination Merging Network     
Corresponding Author(s): Weidong MIN   
Just Accepted Date: 06 August 2021   Issue Date: 01 March 2022
 Cite this article:   
Haoyu ZHAO,Weidong MIN,Jianqiang XU, et al. Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet[J]. Front. Comput. Sci., 2023, 17(1): 171304.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-021-1207-x
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I1/171304
Fig.1  The scene-adaptive crowd counting method with meta-learning. The meta-training step aims to train a pre-trained model. The meta-testing step aims to fine-turned the pre-trained model for new scenes
Fig.2  The structure of the proposed DMNet
Fig.3  The process of image illumination enhanced
Fig.4  The illumination enhanced results with different approaches. The first row is the initial images. The second row is the results with KinD [36]. The third row is the results with RetinexNet [37]. The fourth row is the results with LIMI [38]. The fifth row is the results with DUAL [35]
Fig.5  The structure of the proposed the Multi-scale Feature Extraction module
Fig.6  The structure of the proposed the Element-wise Fusion Module
Fig.7  The visualization results of the feature in frontend network. The first column is the result of image in darkness. The second column is the result of illumination enhanced image. The third column is the result of illumination enhanced image with gray operation
Fig.8  Some examples of four different datasets. The images in black dotted box are from DISCO. The images in red dotted box are fromWorldExpo’10. The images in orange dotted box are from UCSD. The images in blue dotted box are from Mall
Target Methods 1-shot (K=1) 5-shot (K=5)
MAE RMSE MAE RMSE
Scene 1 Meta+CSRNet 19.57 26.17 21.79 28.35
CANNet Pre- 20.72 21.96 21.49 24.27
Meta +CANNet 18.69 22.20 18.83 22.34
DMNet Pre- 18.96 22.64 20.66 23.89
Ours 18.45 21.56 18.14 21.19
Scene 2 Meta +CSRNet 20.39 31.86 20.77 32.44
CANNet Pre- 20.67 23.17 23.06 25.09
Meta +CANNet 17.60 28.54 18.69 22.20
DMNet Pre- 22.55 27.23 18.57 29.77
Ours 16.65 27.61 16.66 27.48
Scene 3 Meta +CSRNet 16.82 27.11 20.94 31.51
CANNet Pre- 26.66 32.51 18.39 28.83
Meta +CANNet 16.04 25.52 16.13 24.75
DMNet Pre- 24.28 28.70 19.21 29.09
Ours 16.47 25.29 16.09 24.69
Scene 4 Meta +CSRNet 22.63 27.33 21.35 25.53
CANNet Pre- 22.61 24.01 22.83 26.88
Meta +CANNet 21.52 27.41 21.34 27.00
DMNet Pre- 22.69 23.98 23.27 25.75
Ours 21.30 23.76 21.18 25.30
Scene 5 Meta +CSRNet 27.49 33.82 34.86 42.22
CANNet Pre- 29.35 34.88 28.94 34.72
Meta +CANNet 28.36 34.80 28.36 34.79
DMNet Pre- 30.49 36.12 29.63 35.95
Ours 27.10 33.45 27.35 33.80
Average Meta +CSRNet 21.38 29.26 23.94 32.01
CANNet Pre- 24.00 27.30 22.94 27.96
Meta +CANNet 20.44 27.69 20.64 26.21
DMNet Pre- 23.79 27.73 22.27 28.89
Ours 19.99 26.33 19.88 26.49
Tab.1  The performances of different models with 1-&5- shots in five scenes on WorldExpo’s10
Target Methods 1-shot (K=1) 5-shot (K=5)
MAE RMSE MAE RMSE
Scene 1 Meta +CSRNet 153.65 210.45 149.91 200.15
CANNet Pre- 132.46 160.22 129.71 156.56
Meta +CANNet 126.40 161.32 127.58 160.68
DMNet Pre- 125.02 154.78 121.92 150.59
Ours 106.30 143.61 104.40 141.75
Scene 2 Meta +CSRNet 66.39 103.46 64.08 100.70
CANNet Pre- 83.56 109.78 86.17 114.83
Meta +CANNet 45.36 96.19 42.53 89.44
DMNet Pre- 57.64 70.29 54.26 71.30
Ours 40.14 86.72 42.50 89.16
Scene 3 Meta +CSRNet 53.64 64.29 50.54 61.30
CANNet Pre- 65.72 74.20 67.51 74.36
Meta +CANNet 63.18 70.45 61.58 69.25
DMNet Pre- 58.37 67.60 60.55 70.12
Ours 54.15 67.00 59.49 68.31
Scene 4 Meta +CSRNet 34.64 43.89 33.31 42.11
CANNet Pre- 35.03 41.78 35.56 42.86
Meta +CANNet 31.25 39.32 30.86 38.56
DMNet Pre- 28.33 33.14 29.57 34.97
Ours 27.01 36.78 26.02 32.56
Scene 5 Meta +CSRNet 78.11 86.34 72.26 80.77
CANNet Pre- 40.90 51.37 38.88 48.31
Meta +CANNet 24.19 29.06 27.72 31.05
DMNet Pre- 33.71 42.50 32.24 41.76
Ours 20.15 32.99 21.58 33.17
Average Meta +CSRNet 77.29 101.69 74.02 97.01
CANNet Pre- 71.53 87.47 71.57 87.38
Meta +CANNet 58.08 79.27 58.05 77.80
DMNet Pre- 60.61 73.66 59.71 73.75
Ours 49.55 73.42 50.80 72.99
Tab.2  The performances of different models with 1-&5- shots in five scenes on DISCO
Methods 1-shot (K=1) 5-shot (K=5)
MAE RMSE MAE RMSE
Meta +CSRNet 5.63 6.92 5.25 6.63
CANNet Pre- 4.59 5.48 4.21 5.37
Meta +CANNet 4.26 5.42 3.95 4.92
DMNet Pre- 4.31 5.14 4.26 5.26
Ours 4.18 4.92 4.01 5.17
Tab.3  The performances of different models with 1-&5- shots on UCSD
Methods 1-shot (K=1) 5-shot (K=5)
MAE RMSE MAE RMSE
Meta +CSRNet 4.69 5.56 4.52 5.36
CANNet Pre- 3.62 4.51 3.36 4.20
Meta +CANNet 3.23 4.45 3.17 4.39
DMNet Pre- 3.50 4.67 3.11 4.02
Ours 3.19 4.49 3.05 4.13
Tab.4  The performances of different models with 1-&5- shots on Mall
Methods Evaluation CSRNet [39] CANNet [40] MSR-FAN Ours
Scene 1 MAE 24.62 19.44 14.38 18.14
RMSE 29.21 22.59 17.25 21.19
Scene 2 MAE 25.14 19.91 15.28 16.65
RMSE 29.63 30.69 26.86 27.61
Scene 3 MAE 28.70 22.07 22.19 16.09
RMSE 39.21 31.49 33.27 24.69
Scene 4 MAE 20.72 10.72 13.19 21.18
RMSE 26.45 13.89 17.55 25.30
Scene 5 MAE 46.89 39.38 37.38 27.10
RMSE 50.10 46.52 46.44 33.45
Average MAE 29.21 22.30 20.48 19.88
RMSE 34.92 29.04 28.27 26.49
Tab.5  The performances of different crowd counting models on scene 1?5 in Table 1
Methods 1-shot (K=1) 5-shot (K=1)
MAE RMSE MAE RMSE
Meta-DMNet-no-MFE 10.76 13.49 9.43 11.75
Meta-DMNet 4.18 4.92 4.01 5.17
Tab.6  The influence of MFE module in the proposed method on UCSD dataset
Methods 1-shot (K=1) 5-shot (K=1)
MAE RMSE MAE RMSE
Meta-DMNet-2 8.36 10.48 9.42 11.25
Meta-DMNet-3 6.46 7.52 6.02 7.61
Meta-DMNet-4 4.18 4.92 4.01 5.17
Meta-DMNet-5 4.63 5.28 4.50 5.29
Meta-DMNet-6 4.50 5.11 4.96 5.33
Tab.7  The experimental results of the proposed method with different EFMs
  
  
  
  
  
  
1 Q Wang , J Gao , W Lin , X Li . NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43( 6): 2141– 2149
2 Y Liu , Q Wen , H Chen , W Liu , J Qin , G Han , S He . Crowd counting via cross-stage refinement networks. IEEE Transactions on Image Processing, 2020, 29 : 6800– 6812
3 J Gao , Q Wang , X Li . PCC Net: perspective crowd counting via spatial convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30( 10): 3486– 3498
4 M K K Reddy, M A Hossain, M Rochan, Y Wang. Few-shot scene adaptive crowd counting using meta-learning. In: Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). 2020, 2803−2812
5 X Liu, J Van De Weijer, A D Bagdanov. Leveraging unlabeled data for crowd counting by learning to rank. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7661−7669
6 C Zhang, H Li, X Wang, X Yang. Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015, 833−841
7 C C Loy, S Gong, T Xiang. From semi-supervised to transfer counting of crowds. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. 2013, 2256−2263
8 C Finn, P Abbeel, S Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126−1135
9 M Zhao , C Zhang , J Zhang , F Porikli , B Ni , W Zhang . Scale-aware crowd counting via depth-embedded convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30( 10): 3651– 3662
10 Y Fang , S Gao , J Li , W Luo , L He , B Hu . Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting. Neurocomputing, 2020, 392 : 98– 107
11 D B Sam , S V Peri , M N Sundararaman , A Kamath , R V Babu . Locate, size, and count: accurately resolving people in dense crowds via detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43( 8): 2739– 2751
12 L Liu , H Lu , H Xiong , K Xian , Z Cao , C Shen . Counting objects by blockwise classification. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30( 10): 3513– 3527
13 X Wu , Y Zheng , H Ye , W Hu , T Ma , J Yang , L He . Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing, 2020, 397 : 127– 138
14 D Hu, L Mou, Q Wang, J Gao, Y Hua, D Dou, X X Zhu. Ambient sound helps: audiovisual crowd counting in extreme conditions. 2020, arXiv preprint arXiv: 2005.07097
15 H Zhao , W Min , X Wei , Q Wang , Q Fu , Z Wei . MSR-FAN: multi-scale residual feature-aware network for crowd counting. IET Image Processing, 2021, 15( 14): 3512– 3521
https://doi.org/10.1049/ipr2.12175
16 H Zheng , Z Lin , J Cen , Z Wu , Y Zhao . Cross-line pedestrian counting based on spatially-consistent two-stage local crowd density estimation and accumulation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29( 3): 787– 799
17 Z Shen, Y Xu, B Ni, M Wang, J Hu, X Yang. Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5245−5254
18 B Yang , W Zhan , N Wang , X Liu , J Lv . Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel. Neurocomputing, 2020, 390 : 207– 216
19 Z Zou , Y Cheng , X Qu , S Ji , X Guo , P Zhou . Attend to count: crowd counting with adaptive capacity multi-scale CNNs. Neurocomputing, 2019, 367 : 75– 83
20 L Wang , B Yin , X Tang , Y Li . Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing, 2019, 322 : 360– 371
21 J Chen , Z Wang . Crowd counting with segmentation attention convolutional neural network. IET Image Processing, 2021, 15( 6): 1221– 1231
https://doi.org/10.1049/ipr2.12099
22 S Jiang , X Lu , Y Lei , L Liu . Mask-aware networks for crowd counting. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30( 9): 3119– 3129
23 W Min , M Fan , X Guo , Q Han . A new approach to track multiple vehicles with the combination of robust detection and two classifiers. IEEE Transactions on Intelligent Transportation Systems, 2018, 19( 1): 174– 186
24 H Yang , L Liu , W Min , X Yang , X Xiong . Driver yawning detection based on subtle facial action recognition. IEEE Transactions on Multimedia, 2020, 23 : 572– 583
https://doi.org/10.1109/TMM.2020.2985536
25 Q Wang , W Min , D He , S Zou , T Huang , Y Zhang , R Liu . Discriminative fine-grained network for vehicle re-identification using two-stage re-ranking. Science China Information Sciences, 2020, 63( 11): 212102–
https://doi.org/10.1007/s11432-019-2811-8
26 Y Ma , G Zhong , W Liu , Y Wang , P Jiang , R Zhang . ML-CGAN: conditional generative adversarial network with a meta-learner structure for high-quality image generation with few training data. Cognitive Computation, 2021, 13( 2): 418– 430
27 I Jung, K You, H Noh, M Cho, B Han. Real-time object tracking via meta-learning: efficient model adaptation and one-shot channel pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11205−11212, doi:
28 T Elsken, B Staffler, J H Metzen, F Hutter. Meta-learning of neural architectures for few-shot learning. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, 12362−12372
29 C Xu , J Shen , X Du . A method of few-shot network intrusion detection based on meta-learning framework. IEEE Transactions on Information Forensics and Security, 2020, 15 : 3540– 3552
30 H J Ye , X R Sheng , D C Zhan . Few-shot learning with adaptively initialized task optimizer: a practical meta-learning approach. Machine Learning, 2020, 109( 3): 643– 664
31 A Nichol, J Achiam, J Schulman. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999v3
32 D Wang , Y Cheng , M Yu , X Guo , T Zhang . A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing, 2019, 349 : 202– 211
33 N Lai , M Kan , C Han , X Song , S Shan . Learning to learn adaptive classifier–predictor for few-shot learning. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 8): 3458– 3470
https://doi.org/10.1109/TNNLS.2020.3011526
34 A B Chan, Z S J Liang, N Vasconcelos. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1−7
35 Q Zhang , Y Nie , W S Zheng . Dual illumination estimation for robust exposure correction. Computer Graphics Forum, 2019, 38( 7): 243– 252
36 Y Zhang, J Zhang, X Guo. Kindling the darkness: a practical low-light image enhancer. In: Proceedings of the 27th ACM International Conference on Multimedia. 2019, 1632−1640
37 C Wei, W Wang, W Yang, J Liu. Deep Retinex decomposition for low-light enhancement. 2018, arXiv preprint arXiv: 1808.04560
38 X Guo , Y Li , H Ling . LIME: low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 2017, 26( 2): 982– 993
39 Y Li, X Zhang, D Chen. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1091−1100
40 W Liu, M Salzmann, P Fua. Context-aware crowd counting. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 5094-5103
41 J Chu , Z Guo , L Leng . Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access, 2018, 6 : 19959– 19967
42 Y Zhang , J Chu , L Leng , J Miao . Mask-Refined R-CNN: a network for refining object details in instance segmentation. Sensors, 2020, 20( 4): 1010–
43 Y Zhang, D Zhou, S Chen, S Gao, Y Ma. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 589−597
[1] Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML2: meta reinforcement learning via meta-learning for task categories[J]. Front. Comput. Sci., 2023, 17(4): 174325-.
[2] Zhong JI, Jingwei NI, Xiyao LIU, Yanwei PANG. Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning[J]. Front. Comput. Sci., 2023, 17(2): 172312-.
[3] Wei GAO, Mingwen SHAO, Jun SHU, Xinkai ZHUANG. Meta-BN Net for few-shot learning[J]. Front. Comput. Sci., 2023, 17(1): 171302-.
[4] Pinzhuo TIAN, Yang GAO. Improving meta-learning model via meta-contrastive loss[J]. Front. Comput. Sci., 2022, 16(5): 165331-.
[5] Xiaoheng JIANG, Hao LIU, Li ZHANG, Geyang LI, Mingliang XU, Pei LV, Bing ZHOU. Transferring priors from virtual data for crowd counting in real world[J]. Front. Comput. Sci., 2022, 16(3): 163314-.
[6] Chong SHANG, Haizhou AI, Yi YANG. Crowd counting via learning perspective for multi-scale multi-view Web images[J]. Front. Comput. Sci., 2019, 13(3): 579-587.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed