Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (6) : 176338    https://doi.org/10.1007/s11704-022-2126-1
Artificial Intelligence
A feature-wise attention module based on the difference with surrounding features for convolutional neural networks
Shuo TAN, Lei ZHANG(), Xin SHU, Zizhou WANG
Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China
 Download: PDF(4326 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Attention mechanism has become a widely researched method to improve the performance of convolutional neural networks (CNNs). Most of the researches focus on designing channel-wise and spatial-wise attention modules but neglect the importance of unique information on each feature, which is critical for deciding both “what” and “where” to focus. In this paper, a feature-wise attention module is proposed, which can give each feature of the input feature map an attention weight. Specifically, the module is based on the well-known surround suppression in the discipline of neuroscience, and it consists of two sub-modules, Minus-Square-Add (MSA) operation and a group of learnable non-linear mapping functions. The MSA imitates the surround suppression and defines an energy function which can be applied to each feature to measure its importance. The group of non-linear functions refines the energy calculated by the MSA to more reasonable values. By these two sub-modules, feature-wise attention can be well captured. Meanwhile, due to the simple structure and few parameters of the two sub-modules, the proposed module can easily be almost integrated into any CNN. To verify the performance and effectiveness of the proposed module, several experiments were conducted on the Cifar10, Cifar100, Cinic10, and Tiny-ImageNet datasets, respectively. The experimental results demonstrate that the proposed module is flexible and effective for CNNs to improve their performance.

Keywords feature-wise attention      surround suppression      image classification      convolutional neural networks     
Corresponding Author(s): Lei ZHANG   
Just Accepted Date: 09 October 2022   Issue Date: 17 January 2023
 Cite this article:   
Shuo TAN,Lei ZHANG,Xin SHU, et al. A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176338.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2126-1
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I6/176338
Fig.1  Comparisons of channel-wise, spatial-wise, and feature-wise attention modules. In each subfigure, the left side represents the input features and the right side represents the feature weights calculated by different attention modules. Most of the existing attention modules are channel-wise attention modules (a) and spatial-wise attention modules (b). They give the same attention weights to features in the same channel or spatial, while feature-wise attention modules (c) can give each feature an attention weight
Fig.2  Overview of the proposed module. Two 1 × 1 convolution modules of the number of groups equal to that of the channels with a tanh function are used to implement the group of non-linear functions
Fig.3  The exact position where the proposed module is integrated into a ResBlock. The module is applied to each ResBlock of the ResNet
  
Model Dataset
Cifar10 Cifar100 Cinic10
ResNet18 (Baseline) 93.21±0.38 73.37±0.20 84.84±0.45
ResNet18 + SE 93.67±0.19 73.93±0.10 85.49±0.10
ResNet18 + CBAM 93.65±0.13 73.41±0.25 85.27±0.13
ResNet18 + ECA 93.45±0.08 72.11±0.48 84.81±0.13
ResNet18 + GCT 93.15±0.35 73.51±0.31 84.97±0.49
ResNet18 + SimAM 93.57±0.11 74.21±0.21 85.48±0.19
ResNet18 + proposed module 93.78±0.09 74.39±0.09 85.53±0.03
ResNet50 (Baseline) 91.49±0.57 69.58±1.54 83.21±1.06
ResNet50 + SE 92.59±0.40 74.46±0.43 84.84±0.54
ResNet50 + CBAM 93.74±0.21 75.91±0.13 85.46±0.41
ResNet50 + ECA 92.33±1.69 74.73±0.77 85.20±0.51
ResNet50 + GCT 90.84±1.24 69.23±1.37 83.78±0.77
ResNet50 + SimAM 92.55±0.26 71.85±1.59 84.95±0.37
ResNet50 + proposed module 93.08±0.52 75.12±0.49 84.88±0.55
Tab.1  Top-1 accuracies (%) for ResNet18 and ResNet50 with diferent attention modules, SE [6], CBAM [7], ECA [39], GCT [34], SimAM [10] and the proposed module on Cifar10, Cifar100, and Cinic10 datasets. All results are reported as mean±std via over five trials
Model Parameters + Parameters-to-baseline FLOPs Top-1 Acc/% Top-5 Acc/%
ResNet18 (Baseline) 11.27M 0 2.23G 65.12 84.23
ResNet18 + SE [6] 11.36M 0.0870M 2.23G 66.38 85.44
ResNet18 + CBAM [7] 11.36M 0.0899M 2.23G 66.04 85.05
ResNet18 + ECA [39] 11.27M 24 2.23G 64.94 84.60
ResNet18 + GCT [34] 11.28M 0.0058M 2.23G 66.21 85.15
ResNet18 + SimAM [10] 11.27M 0 2.23G 65.48 84.37
ResNet18 + proposed module 11.28M 0.0077M 2.23G 65.90 85.00
ResNet34 (Baseline) 21.38M 0 4.65G 66.73 85.29
ResNet34 + SE [6] 21.54M 0.1572M 4.65G 67.29 86.16
ResNet34 + CBAM [7] 21.54M 0.1628M 4.65G 67.14 85.80
ResNet34 + ECA [39] 21.38M 48 4.65G 66.22 85.35
ResNet34 + GCT [34] 21.39M 0.0113M 4.65G 67.37 85.92
ResNet34 + SimAM [10] 21.38M 0 4.65G 67.29 85.87
ResNet34 + proposed module 21.39M 0.0151M 4.65G 67.49 85.86
ResNet50 (Baseline) 23.91M 0 5.22G 68.19 86.66
ResNet50 + SE [6] 26.43M 2.5149M 5.23G 69.68 87.83
ResNet50 + CBAM [7] 26.44M 2.5326M 5.23G 69.30 87.81
ResNet50 + ECA [39] 23.91M 48 5.23G 68.63 86.60
ResNet50 + GCT [34] 21.96M 0.0453M 5.22G 69.14 87.13
ResNet50 + SimAM [10] 23.91M 0 5.22G 68.93 87.27
ResNet50 + proposed module 23.97M 0.0604M 5.25G 69.89 87.72
ResNet101 (Baseline) 42.90M 0 10.08G 69.92 87.58
ResNet101 + SE [6] 47.65M 4.7431M 10.10G 71.06 88.53
ResNet101 + CBAM [7] 47.68M 4.7810M 10.09G 70.48 88.09
ResNet101 + ECA [39] 42.90M 99 10.09G 69.52 87.61
ResNet101 + GCT [34] 43.00M 0.0975M 10.08G 71.02 88.33
ResNet101 + SimAM [10] 42.90M 0 10.08G 69.79 87.78
ResNet101 + proposed module 43.03M 0.1300M 10.13G 70.46 88.46
MobileNetV2 (Baseline) 2.54M 0 0.38G 62.65 84.47
MobileNetV2 + SE [6] 2.57M 0.0284M 0.38G 62.39 84.17
MobileNetV2 + CBAM [7] 2.57M 0.0317M 0.38G 62.39 83.89
MobileNetV2 + ECA [39] 2.54M 51 0.38G 61.74 83.50
MobileNetV2 + GCT [34] 2.54M 0.0045M 0.38G 63.55 85.09
MobileNetV2 + SimAM [10] 2.54M 0 0.38G 63.50 84.92
MobileNetV2 + proposed module 2.55M 0.0060M 0.38G 63.62 85.14
Tab.2  Parameters, additional parameters to baseline, FLOPs, Top-1 and Top-5 accuracies (%) for various models with SE [6], CBAM [7], ECA [39], GCT [34], SimAM [10] and the proposed module on Tiny-ImageNet
Fig.4  Visualization results of the intermediate features and the attention weights calculated by the MSA
Model Top-1 Acc/% Top-5 Acc/%
ResNet18 (Baseline) 65.12 84.23
ResNet18 + MSA (With normalization) 59.45 80.49
ResNet18 + MSA (With sigmoid) 65.49 84.69
ResNet18 + group of non-linear functions 65.63 84.38
ResNet18 + proposed module 65.90 85.00
ResNet34 (Baseline) 66.73 85.29
ResNet34 + MSA (With normalization) 62.49 82.38
ResNet34 + MSA (With sigmoid) 66.87 85.53
ResNet34 + group of non-linear functions 67.24 85.63
ResNet34 + proposed module 67.49 85.86
ResNet50 (Baseline) 68.19 86.66
ResNet50 + MSA (With normalization) 66.60 86.31
ResNet50 + MSA (With sigmoid) 69.16 87.25
ResNet50 + group of non-linear functions 69.50 87.22
ResNet50 + proposed module 69.89 87.72
ResNet101 (Baseline) 69.92 87.58
ResNet101 + MSA (With normalization) 59.40 81.65
ResNet101 + MSA (With sigmoid) 69.02 87.38
ResNet101 + group of non-linear functions 70.29 87.86 3
ResNet101 + proposed module 70.46 88.46
MobileNetV2 (Baseline) 62.65 84.47
MobileNetV2 + MSA (With normalization) 52.08 76.87
MobileNetV2 + MSA (With sigmoid) 63.43 84.80
MobileNetV2 + group of non-linear functions 63.11 84.50
MobileNetV2 + proposed module 63.62 85.14
Tab.3  Ablation studies on Tiny-ImageNet
Model Top-1 Acc/% Top-5 Acc/%
ResNet18 (Baseline) 65.12 84.23
ResNet18 + Proposed Module (Using sigmoid) 65.91 84.85
ResNet18 + Proposed Module (Using tanh) 65.90 85.00
ResNet34 (Baseline) 66.73 85.29
ResNet34 + Proposed Module (Using sigmoid) 67.20 86.02
ResNet34 + Proposed Module (Using tanh) 67.49 85.86
ResNet50 (Baseline) 68.19 86.66
ResNet50 + proposed module (Using sigmoid) 68.92 86.98
ResNet50 + proposed module (Using tanh) 69.89 87.72
ResNet101 (Baseline) 69.92 87.58
ResNet101 + proposed module (Using sigmoid) 70.31 87.71
ResNet101 + proposed module (Using tanh) 70.46 88.46
MobileNetV2 (Baseline) 62.65 84.47
MobileNetV2 + proposed module (Using sigmoid) 63.14 85.16
MobileNetV2 + proposed module (Using tanh) 63.62 85.14
Tab.4  Experiments on Tiny-ImageNet to choose non-linear activation function of the non-linear mapping function
Fig.5  Visualization results using Grad-CAM [41]. The visualization results of SE, CBAM, SimAM, and the proposed module integrated into ResNet50 on the Tiny-ImageNet validation set, respectively
  
  
  
  
1 Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
2 A Krizhevsky . Learning multiple layers of features from tiny images. Toronto: University of Toronto, 2009
3 M, Everingham Gool L, van C K I, Williams J, Winn A Zisserman . The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88( 2): 303–338
4 M, Cordts M, Omran S, Ramos T, Rehfeld M, Enzweiler R, Benenson U, Franke S, Roth B Schiele . The cityscapes dataset for semantic urban scene understanding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3213–3223
5 T Y, Lin M, Maire S, Belongie J, Hays P, Perona D, Ramanan P, Dollár C L Zitnick . Microsoft COCO: common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 740–755
6 J, Hu L, Shen G Sun . Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7132–7141
7 S, Woo J, Park J Y, Lee I S Kweon . CBAM: convolutional block attention module. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 3–19
8 X, Wang R, Girshick A, Gupta K He . Non-local neural networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7794–7803
9 F, Wang M, Jiang C, Qian S, Yang C, Li H, Zhang X, Wang X Tang . Residual attention network for image classification. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6450–6458
10 L, Yang R Y, Zhang L, Li X Xie . SimAM: a simple, parameter-free attention module for convolutional neural networks. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11863–11874
11 L, Wang L, Zhang X, Qi Z Yi . Deep attention-based imbalanced image classification. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 8): 3320–3330
12 L, Chen H, Zhang J, Xiao L, Nie J, Shao W, Liu T S Chua . SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6298–6306
13 M Carrasco . Visual attention: the past 25 years. Vision Research, 2011, 51( 13): 1484–1525
14 N, Liu J, Han M H Yang . PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 3089–3098
15 W, Zhang C Xiao . PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 12428–12437
16 B S, Webb N T, Dhruv S G, Solomon C, Tailby P Lennie . Early and late mechanisms of surround suppression in striate cortex of macaque. Journal of Neuroscience, 2005, 25( 50): 11666–11675
17 K, Simonyan A Zisserman . Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv: 1409.1556
18 C, Szegedy W, Liu Y, Jia P, Sermanet S, Reed D, Anguelov D, Erhan V, Vanhoucke A Rabinovich . Going deeper with convolutions. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
19 K, He X, Zhang S, Ren J Sun . Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
20 S, Zagoruyko N Komodakis . Wide residual networks. In: Proceedings of British Machine Vision Conference. 2016, 87.1–87.12
21 G, Huang Z, Liu der Maaten L, van K Q Weinberger . Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269
22 F Chollet . Xception: deep learning with depthwise separable convolutions. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1800–1807
23 S, Xie R, Girshick P, Dollár Z, Tu K He . Aggregated residual transformations for deep neural networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5987–5995
24 T, Domhan J T, Springenberg F Hutter . Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 3460–3468
25 D, Ha A, Dai Q V Le . Hypernetworks. 2016, arXiv preprint arXiv: 1609.09106
26 B, Zoph Q V Le . Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
27 H, Mendoza A, Klein M, Feurer J T, Springenberg F Hutter . Towards automatically-tuned neural networks. In: Proceedings of Workshop on Automatic Machine Learning. 2016, 58–65
28 I, Bello B, Zoph V, Vasudevan Q V Le . Neural optimizer search with reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 459–468
29 C, Fernando D, Banarse C, Blundell Y, Zwols D, Ha A A, Rusu A, Pritzel D Wierstra . Pathnet: Evolution channels gradient descent in super neural networks. 2017, arXiv preprint arXiv: 1701.08734
30 A G, Roy N, Navab C Wachinger . Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Transactions on Medical Imaging, 2019, 38( 2): 540–549
31 Y, Chen Y, Kalantidis J, Li S, Yan J Feng . A2-nets: double attention networks. 2018, arXiv preprint arXiv: 1810.11579
32 J, Fu J, Liu H, Tian Y, Li Y, Bao Z, Fang H Lu . Dual attention network for scene segmentation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3141–3149
33 Y, Cao J, Xu S, Lin F, Wei H Hu . GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. 2019, 1971–1980
34 Z, Yang L, Zhu Y, Wu Y Yang . Gated channel transformation for visual recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11791–11800
35 S, Ioffe C Szegedy . Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 448–456
36 L N, Darlow E J, Crowley A, Antoniou A J Storkey . CINIC-10 is not ImageNet or CIFAR-10. 2018, arXiv preprint arXiv: 1810.03505
37 C Y, Lee S, Xie P, Gallagher Z, Zhang Z Tu . Deeply-supervised nets. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. 2015, 562–570
38 M, Sandler A, Howard M, Zhu A, Zhmoginov L C Chen . MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 4510–4520
39 Q, Wang B, Wu P, Zhu P, Li W, Zuo Q Hu . ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11531–11539
40 M D, Zeiler R Fergus . Visualizing and understanding convolutional networks. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 818–833
41 R R, Selvaraju M, Cogswell A, Das R, Vedantam D, Parikh D Batra . Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 618–626
[1] FCS-22126-OF-ST_suppl_1 Download
[1] Xinyu TONG, Ziao YU, Xiaohua TIAN, Houdong GE, Xinbing WANG. Improving accuracy of automatic optical inspection with machine learning[J]. Front. Comput. Sci., 2022, 16(1): 161310-.
[2] Ashok KUMAR, Arpit JAIN. Image smog restoration using oblique gradient profile prior and energy minimization[J]. Front. Comput. Sci., 2021, 15(6): 156706-.
[3] Huiying ZHANG, Yu ZHANG, Xin GENG. Practical age estimation using deep label distribution learning[J]. Front. Comput. Sci., 2021, 15(3): 153318-.
[4] Anna ZHU, Seiichi UCHIDA. Scene word recognition from pieces to whole[J]. Front. Comput. Sci., 2019, 13(2): 292-301.
[5] Qianjun ZHANG, Lei ZHANG. Convolutional adaptive denoising autoencoders for hierarchical feature extraction[J]. Front. Comput. Sci., 2018, 12(6): 1140-1148.
[6] Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU. Robust feature learning for online discriminative tracking without large-scale pre-training[J]. Front. Comput. Sci., 2018, 12(6): 1160-1172.
[7] Lili HUANG, Jiefeng PENG, Ruimao ZHANG, Guanbin LI, Liang LIN. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Front. Comput. Sci., 2018, 12(5): 840-857.
[8] Feifei ZHANG,Yongbin YU,Qirong MAO,Jianping GOU,Yongzhao ZHAN. Pose-robust feature learning for facial expression recognition[J]. Front. Comput. Sci., 2016, 10(5): 832-844.
[9] Lu YU,Jun XIE,Songcan CHEN,Lei ZHU. Generating labeled samples for hyperspectral image classification using correlation of spectral bands[J]. Front. Comput. Sci., 2016, 10(2): 292-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed