|
|
A feature-wise attention module based on the difference with surrounding features for convolutional neural networks |
Shuo TAN, Lei ZHANG( ), Xin SHU, Zizhou WANG |
Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China |
|
|
Abstract Attention mechanism has become a widely researched method to improve the performance of convolutional neural networks (CNNs). Most of the researches focus on designing channel-wise and spatial-wise attention modules but neglect the importance of unique information on each feature, which is critical for deciding both “what” and “where” to focus. In this paper, a feature-wise attention module is proposed, which can give each feature of the input feature map an attention weight. Specifically, the module is based on the well-known surround suppression in the discipline of neuroscience, and it consists of two sub-modules, Minus-Square-Add (MSA) operation and a group of learnable non-linear mapping functions. The MSA imitates the surround suppression and defines an energy function which can be applied to each feature to measure its importance. The group of non-linear functions refines the energy calculated by the MSA to more reasonable values. By these two sub-modules, feature-wise attention can be well captured. Meanwhile, due to the simple structure and few parameters of the two sub-modules, the proposed module can easily be almost integrated into any CNN. To verify the performance and effectiveness of the proposed module, several experiments were conducted on the Cifar10, Cifar100, Cinic10, and Tiny-ImageNet datasets, respectively. The experimental results demonstrate that the proposed module is flexible and effective for CNNs to improve their performance.
|
Keywords
feature-wise attention
surround suppression
image classification
convolutional neural networks
|
Corresponding Author(s):
Lei ZHANG
|
Just Accepted Date: 09 October 2022
Issue Date: 17 January 2023
|
|
1 |
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
|
2 |
A Krizhevsky . Learning multiple layers of features from tiny images. Toronto: University of Toronto, 2009
|
3 |
M, Everingham Gool L, van C K I, Williams J, Winn A Zisserman . The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88( 2): 303–338
|
4 |
M, Cordts M, Omran S, Ramos T, Rehfeld M, Enzweiler R, Benenson U, Franke S, Roth B Schiele . The cityscapes dataset for semantic urban scene understanding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3213–3223
|
5 |
T Y, Lin M, Maire S, Belongie J, Hays P, Perona D, Ramanan P, Dollár C L Zitnick . Microsoft COCO: common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 740–755
|
6 |
J, Hu L, Shen G Sun . Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7132–7141
|
7 |
S, Woo J, Park J Y, Lee I S Kweon . CBAM: convolutional block attention module. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 3–19
|
8 |
X, Wang R, Girshick A, Gupta K He . Non-local neural networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7794–7803
|
9 |
F, Wang M, Jiang C, Qian S, Yang C, Li H, Zhang X, Wang X Tang . Residual attention network for image classification. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6450–6458
|
10 |
L, Yang R Y, Zhang L, Li X Xie . SimAM: a simple, parameter-free attention module for convolutional neural networks. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11863–11874
|
11 |
L, Wang L, Zhang X, Qi Z Yi . Deep attention-based imbalanced image classification. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 8): 3320–3330
|
12 |
L, Chen H, Zhang J, Xiao L, Nie J, Shao W, Liu T S Chua . SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6298–6306
|
13 |
M Carrasco . Visual attention: the past 25 years. Vision Research, 2011, 51( 13): 1484–1525
|
14 |
N, Liu J, Han M H Yang . PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 3089–3098
|
15 |
W, Zhang C Xiao . PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 12428–12437
|
16 |
B S, Webb N T, Dhruv S G, Solomon C, Tailby P Lennie . Early and late mechanisms of surround suppression in striate cortex of macaque. Journal of Neuroscience, 2005, 25( 50): 11666–11675
|
17 |
K, Simonyan A Zisserman . Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv: 1409.1556
|
18 |
C, Szegedy W, Liu Y, Jia P, Sermanet S, Reed D, Anguelov D, Erhan V, Vanhoucke A Rabinovich . Going deeper with convolutions. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
|
19 |
K, He X, Zhang S, Ren J Sun . Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
|
20 |
S, Zagoruyko N Komodakis . Wide residual networks. In: Proceedings of British Machine Vision Conference. 2016, 87.1–87.12
|
21 |
G, Huang Z, Liu der Maaten L, van K Q Weinberger . Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269
|
22 |
F Chollet . Xception: deep learning with depthwise separable convolutions. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1800–1807
|
23 |
S, Xie R, Girshick P, Dollár Z, Tu K He . Aggregated residual transformations for deep neural networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5987–5995
|
24 |
T, Domhan J T, Springenberg F Hutter . Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 3460–3468
|
25 |
D, Ha A, Dai Q V Le . Hypernetworks. 2016, arXiv preprint arXiv: 1609.09106
|
26 |
B, Zoph Q V Le . Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
|
27 |
H, Mendoza A, Klein M, Feurer J T, Springenberg F Hutter . Towards automatically-tuned neural networks. In: Proceedings of Workshop on Automatic Machine Learning. 2016, 58–65
|
28 |
I, Bello B, Zoph V, Vasudevan Q V Le . Neural optimizer search with reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 459–468
|
29 |
C, Fernando D, Banarse C, Blundell Y, Zwols D, Ha A A, Rusu A, Pritzel D Wierstra . Pathnet: Evolution channels gradient descent in super neural networks. 2017, arXiv preprint arXiv: 1701.08734
|
30 |
A G, Roy N, Navab C Wachinger . Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Transactions on Medical Imaging, 2019, 38( 2): 540–549
|
31 |
Y, Chen Y, Kalantidis J, Li S, Yan J Feng . A2-nets: double attention networks. 2018, arXiv preprint arXiv: 1810.11579
|
32 |
J, Fu J, Liu H, Tian Y, Li Y, Bao Z, Fang H Lu . Dual attention network for scene segmentation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3141–3149
|
33 |
Y, Cao J, Xu S, Lin F, Wei H Hu . GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. 2019, 1971–1980
|
34 |
Z, Yang L, Zhu Y, Wu Y Yang . Gated channel transformation for visual recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11791–11800
|
35 |
S, Ioffe C Szegedy . Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 448–456
|
36 |
L N, Darlow E J, Crowley A, Antoniou A J Storkey . CINIC-10 is not ImageNet or CIFAR-10. 2018, arXiv preprint arXiv: 1810.03505
|
37 |
C Y, Lee S, Xie P, Gallagher Z, Zhang Z Tu . Deeply-supervised nets. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. 2015, 562–570
|
38 |
M, Sandler A, Howard M, Zhu A, Zhmoginov L C Chen . MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 4510–4520
|
39 |
Q, Wang B, Wu P, Zhu P, Li W, Zuo Q Hu . ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11531–11539
|
40 |
M D, Zeiler R Fergus . Visualizing and understanding convolutional networks. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 818–833
|
41 |
R R, Selvaraju M, Cogswell A, Das R, Vedantam D, Parikh D Batra . Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 618–626
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|