Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (5) : 185327    https://doi.org/10.1007/s11704-023-2563-5
Artificial Intelligence
Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
Zizhang WU1(), Yuanzhu GAN1, Tianhao XU1,2, Fan WANG1
1. Computer Vision Perception Department of ZongMu Technology, Shanghai 201203, China
2. Faculty of Electrical Engineering, Information Technology, Physics, Technical University of Braunschweig, Braunschweig 38106, Germany
 Download: PDF(7855 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a graph transformer and a boundary-aware attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the graph transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object’s edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.

Keywords graph transformer      graph relation network      boundary-aware      attention      semantic segmentation     
Corresponding Author(s): Zizhang WU   
Just Accepted Date: 22 May 2023   Issue Date: 10 July 2023
 Cite this article:   
Zizhang WU,Yuanzhu GAN,Tianhao XU, et al. Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation[J]. Front. Comput. Sci., 2024, 18(5): 185327.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2563-5
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I5/185327
Fig.1  An illustration of the proposed Graph-Segmenter with boundary-aware attention for semantic segmentation. The top shows the segmentation results of the previous transformer-based semantic segmentation methods (e.g., Swin [14]). The bottom shows the actual segmentation results of our proposed Graph-Segmenter, which achieves promising boundary segmentation via the hierarchical level graph reasoning and efficient boundary adjustment requiring no additional annotation
Fig.2  An overview of the proposed Graph-Segmenter with efficient boundary adjustment for semantic segmentation, which includes Global Relation Modeling, Local Relation Modeling, and Boundary-aware Attention. “GR” denotes the global window-aware relation module and “LR” denotes the local window-aware relation module. “GR” or “LR” consists of 1×1 convolution, softmax, W and 1×1 convolution. W is the learnable weighting matrix, corresponding to W(l) in Eq. (6), and also equivalent to Cgr in “GR” or Clr in “LR”
Fig.3  Three designs of fusion. “GR” denotes the global relation modeling module and “LR” denotes the local relation modeling module. (a) Two relation modeling modules are connected in two different series connections; (b) two relation modeling modules are connected in parallel
Method # param. GMac
Swin* [14] 233.66 191.45
Graph-Segmenter (Ours) 283.46 195.63
Tab.1  The complexity comparison of our Graph-Segmenter model and the original Swin [14] model with Swin-L as a backbone in terms of the number of model parameters (#param) and the amount of multiplication computation (GMac). * refers to reproduced results by us
Fig.4  Some examples of the three datasets: Cityscapes (first row), ADE-20k (second row), and PASCAL Context (third row)
Method Backbone val mIoU
FCN [23] ResNet-101 76.6
Non-local [66] ResNet-101 79.1
DLab.v3+ [26] ResNet-101 79.3
DNL [65] ResNet-101 80.5
DenseASPP [67] DenseNet 80.6
DPC [68] Xception-71 80.8
CCNet [31] ResNet-101 81.3
DANet [27] ResNet-101 81.5
Panoptic-Deeplab [69] Xception-71 81.5
Strip Pooling [70] ResNet-101 81.9
Seg-L-Mask/16 [42] ViT-L 81.3
SETR-PUP [43] ViT-L 82.2
Swin* [14] Swin-L 82.3
Graph-Segmenter (Ours) Swin-L 82.9
Tab.2  Multi-scale inference results of semantic segmentation on the Cityscapes validation dataset compared with state-of-the-art methods. * refers to reproduced results
Method Backbone Test mIoU
PSPNet [25] ResNet-101 78.4
BiSeNet [71] ResNet-101 78.9
PSANet [72] ResNet-101 80.1
OCNet [73] ResNet-101 80.1
BFP [34] ResNet-101 81.4
DANet [27] ResNet-101 81.5
CCNet [31] ResNet-101 81.9
SETR-PUP [43] ViT-L 81.1
Swin* [14] Swin-L 80.6
Graph-Segmenter (Ours) Swin-L 81.9
Tab.3  Results of semantic segmentation on the Cityscapes test dataset compared with state-of-the-art methods. * refers to reproduced results by us
Method Backbone val mIoU
FCN [23] ResNet-101 41.4
UperNet [74] ResNet-101 44.9
DANet [27] ResNet-101 45.3
OCRNet [29] ResNet-101 45.7
ACNet [75] ResNet-101 45.9
DNL [65] ResNet-101 46.0
DLab.v3+ [26] ResNeSt-101 47.3
DLab.v3+ [26] ResNeSt-200 48.4
SETR-PUP [43] ViT-L 50.3
SegFormer-B5 [15] SegFormer 51.8
Seg-L-Mask/16 [42] ViT-L 53.6
Swin* [14] Swin-L 53.1
Graph-Segmenter (Ours) Swin-L 53.9
Tab.4  Multi-scale inference results of semantic segmentation on the ADE-20k validation dataset compared with state-of-the-art methods. * refers to reproduced results by us
Method Backbone Test score
ACNet [75] ResNet-101 38.5
DLab.v3+ [26] ResNeSt-101 55.1
OCRNet [29] ResNet-101 56.0
DNL [65] ResNet-101 56.2
SETR-PUP [43] ViT-L 61.7
Swin* [14] Swin-L 61.5
Graph-Segmenter (Ours) Swin-L 62.4
Tab.5  Results of semantic segmentation on the ADE-20k test dataset compared with state-of-the-art methods. * refers to reproduced results by us
Fig.5  Val mIoU comparison with Swin Tranformer [14] in each class on the Cityscapes val set. Our Graph-Segmenter generally achieves better performance. “sw” denotes “sidewalk”, “bd” denotes “building”, “tl” denotes “traffic light”, “ts” denotes “traffic sign”, “vg” denotes “vegetation”, “mc” denotes “motorcycle”, and “bc” denotes “bicycle”
Method Backbone mIoU
FCN [23] ResNet-101 45.74
PSPNet [25] ResNet-101 47.80
DANet [27] ResNet-101 52.60
EMANet [77] ResNet-101 53.10
SVCNet [78] ResNet-101 53.20
BFP [34] ResNet-101 53.6
Strip pooling [70] ResNet-101 54.50
GFFNet [32] ResNet-101 54.20
APCNet [33] ResNet-101 54.70
GRAr [28] ResNet-101 55.70
SETR-PUP [43] ViT-L 55.83
UperNet [74] Swin-L 57.48
Graph-Segmenter + UperNet (Ours) Swin-L 57.80
Graph-Segmenter + UperNet + CAR (Ours) Swin-L 59.01
Tab.6  Results of semantic segmentation on the PASCAL Context dataset compared with state-of-the-art methods
Fig.6  Qualitative performance comparison of our proposed Graph-Segmenter with DeepLabV3+ [26] and Swin Tranformer [14] for semantic segmentation. Our Graph-Segmenter can obtain a better segmentation boundary
θ Backbone val mIoU
2v Swin-T 76.44
v Swin-T 76.27
12v Swin-T 76.07
14v Swin-T 77.64
18v Swin-T 77.05
Tab.7  mIoU performance of different θ on Cityscape dataset. We set v=1K2iKjKri,j for the convenience of expression. “-T” denotes Tiny
Connection type Backbone val mIoU
GR LR Swin-T 76.70
LR followed by GR Swin-T 76.44
GR followed by LR Swin-T 76.99
Tab.8  mIoU performance of different connection types on Cityscapes dataset. “” denotes parallel connection. “-T” denotes Tiny
Method Backbone val mIoU
Swin [14] Swin-T 75.82
Swin-GT Swin-T 76.99
Swin-BA Swin-T 75.94
Swin-GTBA Swin-T 77.32
Tab.9  mIoU performance of different components on Cityscape dataset. “-T” denotes Tiny
Method Backbone val mIoU
Swin [14] Swin-T 75.82
Swin-GR Swin-T 76.24
Swin-LR Swin-T 76.28
Swin-GT Swin-T 76.99
Tab.10  mIoU performance of two components in the graph transformer module on Cityscape dataset. “-T” denotes Tiny
r r=2 r=4 r=8 r=16 r=32
mIoU 76.65 76.15 76.29 76.99 76.35
Tab.11  mIoU performance of Graph-Segmenter (only with graph transformer module) on Cityscapes dataset for various channel compression ratio r
  
  
  
  
1 H, Ruan H, Song B, Liu Y, Cheng Q Liu . Intellectual property protection for deep semantic segmentation models. Frontiers of Computer Science, 2023, 17( 1): 171306
2 D, Zhang Y, Zhou J, Zhao Z, Yang H, Dong R, Yao H Ma . Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation. Frontiers of Computer Science, 2022, 16( 4): 164351
3 S, Grigorescu B, Trasnea T, Cocias G Macesanu . A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 2020, 37( 3): 362–386
4 D, Feng C, Haase-Schütz L, Rosenbaum H, Hertlein C, Gläser F, Timm W, Wiesbeck K Dietmayer . Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 2021, 22( 3): 1341–1360
5 J, Janai F, Güney A, Behl A Geiger . Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision, 2020, 12(1−3): 1−308
6 E, Arnold O Y, Al-Jarrah M, Dianati S, Fallah D, Oxtoby A Mouzakitis . A survey on 3D object detection methods for autonomous driving applications. IEEE Transactions on Intelligent Transportation Systems, 2019, 20( 10): 3782–3795
7 P, Wang P, Chen Y, Yuan D, Liu Z, Huang X, Hou G Cottrell . Understanding convolution for semantic segmentation. In: Proceedings of the Winter Conference on Applications of Computer Vision. 2018, 1451−1460
8 Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186
9 L, Wang D, Li Y, Zhu L, Tian Y Shan . Dual super-resolution learning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 3773−3782
10 C, Yu J, Wang C, Gao G, Yu C, Shen N Sang . Context prior for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 12413−12422
11 J W, Rae A, Potapenko S M, Jayakumar T P Lillicrap . Compressive transformers for long-range sequence modelling. In: Proceedings of the 8th International Conference on Learning Representations. 2020
12 J, Lee Y, Lee J, Kim A, Kosiorek S, Choi Y W Teh . Set transformer: a framework for attention-based permutation-invariant neural networks. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3744−3753
13 A, Dosovitskiy L, Beyer A, Kolesnikov D, Weissenborn X, Zhai T, Unterthiner M, Dehghani M, Minderer G, Heigold S, Gelly J, Uszkoreit N Houlsby . An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021
14 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 9992−10002
15 E, Xie W, Wang Z, Yu A, Anandkumar J M, Alvarez P Luo . SegFormer: simple and efficient design for semantic segmentation with transformers. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021
16 X, Chu Z, Tian Y, Wang B, Zhang H, Ren X, Wei H, Xia C Shen . Twins: revisiting the design of spatial attention in vision transformers. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021
17 J, Fang L, Xie X, Wang X, Zhang W, Liu Q Tian . MSG-transformer: exchanging local spatial information by manipulating messenger tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12053−12062
18 P, Wang X, Wang F, Wang M, Lin S, Chang H, Li R Jin . KVT: k-NN attention for boosting vision transformers. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 285−302
19 X, Chu B, Zhang Z, Tian X, Wei H Xia . Do we really need explicit position encodings for vision transformers? 2021, arXiv preprint arXiv: 2102.10882
20 Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3213−3223
21 B, Zhou H, Zhao X, Puig T, Xiao S, Fidler A, Barriuso A Torralba . Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 2019, 127( 3): 302–321
22 Mottaghi R, Chen X, Liu X, Cho N G, Lee S, Fidler S, Urtasun R, Yuille A. The role of context for object detection and semantic segmentation in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 891−898
23 Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440
24 Y, Shen H, Zhang Y, Fan A P, Lee L Xu . Smart health of ultrasound telemedicine based on deeply represented semantic segmentation. IEEE Internet of Things Journal, 2021, 8( 23): 16770–16778
25 Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6230−6239
26 L C, Chen Y, Zhu G, Papandreou F, Schroff H Adam . Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 833−851
27 Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3141−3149
28 Ding H, Zhang H, Liu J, Li J, Feng Z, Jiang X. Interaction via bi-directional graph of semantic region affinity for scene parsing. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 15828−15838
29 Y, Yuan X, Chen J Wang . Object-contextual representations for semantic segmentation. In: Proceedings of the European Conference on Computer Vision. 2020
30 X, Li A, You Z, Zhu H, Zhao M, Yang K, Yang S, Tan Y Tong . Semantic flow for fast and accurate scene parsing. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 775−793
31 Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 603−612
32 X, Li H, Zhao L, Han Y, Tong S, Tan K Yang . Gated fully fusion for semantic segmentation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11418−11425
33 He J, Deng Z, Zhou L, Wang Y, Qiao Y. Adaptive pyramid context network for semantic segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7511−7520
34 Ding H, Jiang X, Liu A Q, Thalmann N M, Wang G. Boundary-aware feature propagation for scene segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 6818−6828
35 V, Mnih N, Heess A, Graves K Kavukcuoglu . Recurrent models of visual attention. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2204−2212
36 D, Bahdanau K, Cho Y Bengio . Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
37 N, Parmar A, Vaswani J, Uszkoreit L, Kaiser N, Shazeer A, Ku D Tran . Image transformer. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 4055−4064
38 N, Carion F, Massa G, Synnaeve N, Usunier A, Kirillov S Zagoruyko . End-to-end object detection with transformers. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 213−229
39 X, Zhu W, Su L, Lu B, Li X, Wang J Dai . Deformable DETR: deformable transformers for end-to-end object detection. In: Proceedings of the 9th International Conference on Learning Representations. 2021
40 Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H. End-to-end video instance segmentation with transformers. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8737−8746
41 Y, Wang V, Guizilini T, Zhang Y, Wang H, Zhao J Solomon . DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In: Proceedings of the Conference on Robot Learning. 2021, 180−191
42 Strudel R, Garcia R, Laptev I, Schmid C. Segmenter: transformer for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 7242−7252
43 Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr P H S, Zhang L. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 6877−6886
44 L, Zhang X, Li A, Arnab K, Yang Y, Tong P H S Torr . Dual graph convolutional network for semantic segmentation. In: Proceedings of the 30th British Machine Vision Conference 2019. 2019, 254
45 Pan S Y, Lu C Y, Lee S P, Peng W H. Weakly-supervised image semantic segmentation using graph convolutional networks. In: Proceedings of IEEE International Conference on Multimedia and Expo. 2021, 1−6
46 H, Wang L, Dong M Sun . Local feature aggregation algorithm based on graph convolutional network. Frontiers of Computer Science, 2022, 16( 3): 163309
47 J, Wu X, He X, Wang Q, Wang W, Chen J, Lian X Xie . Graph convolution machine for context-aware recommender system. Frontiers of Computer Science, 2022, 16( 6): 166614
48 J, Bruna W, Zaremba A, Szlam Y LeCun . Spectral networks and locally connected networks on graphs. In: Proceedings of the 2nd International Conference on Learning Representations. 2014
49 P, Velickovic G, Cucurull A, Casanova A, Romero P, Liò Y Bengio . Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations. 2018
50 Zhang L, Xu D, Arnab A, Torr P H S. Dynamic graph message passing networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 3723−3732
51 Y, Zhu X, Xu F, Shen Y, Ji L, Gao H T Shen . PoseGTAC: graph transformer encoder-decoder with atrous convolution for 3D human pose estimation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 1359−1365
52 X, Dong C, Long W, Xu C Xiao . Dual graph convolutional networks with transformer and curriculum learning for image captioning. In: Proceedings of the 29th ACM International Conference on Multimedia. 2021, 2615−2624
53 S, Yan Y, Xiong D Lin . Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 7444−7452
54 T, Li K, Zhang S, Shen B, Liu Q, Liu Z Li . Image co-saliency detection and instance co-segmentation using attention graph clustering based graph convolutional network. IEEE Transactions on Multimedia, 2022, 24: 492–505
55 X, Li Y, Yang Q, Zhao T, Shen Z, Lin H Liu . Spatial pyramid based graph reasoning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8947−8956
56 H, Hu D, Ji W, Gan S, Bai W, Wu J Yan . Class-wise dynamic graph convolution for semantic segmentation. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 1−17
57 Y, Zhang M, Liu J, He F, Pan Y Guo . Affinity fusion graph-based framework for natural image segmentation. IEEE Transactions on Multimedia, 2022, 24: 440–450
58 C, Chen S, Qian Q, Fang C Xu . HAPGN: hierarchical attentive pooling graph network for point cloud segmentation. IEEE Transactions on Multimedia, 2021, 23: 2335–2346
59 Y, Su W, Liu Z, Yuan M, Cheng Z, Zhang X, Shen C Wang . DLA-Net: learning dual local attention features for semantic segmentation of large-scale building facade point clouds. Pattern Recognition, 2022, 123: 108372
60 Y, Liu S, Yang B, Li W, Zhou J, Xu H, Li Y Lu . Affinity derivation and graph merge for instance segmentation. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 708−724
61 Z, Zhang P, Cui W Zhu . Deep learning on graphs: a survey. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 1): 249–270
62 Z, Wu S, Pan F, Chen G, Long C, Zhang P S Yu . A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 1): 4–24
63 W L, Hamilton R, Ying J Leskovec . Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1025−1035
64 T N, Kipf M Welling . Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017
65 M, Yin Z, Yao Y, Cao X, Li Z, Zhang S, Lin H Hu . Disentangled non-local neural networks. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 191−207
66 Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7794−7803
67 Yang M, Yu K, Zhang C, Li Z, Yang K. DenseASPP for semantic segmentation in street scenes. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 3684−3692
68 L C, Chen M D, Collins Y, Zhu G, Papandreou B, Zoph F, Schroff H, Adam J Shlens . Searching for efficient multi-scale architectures for dense image prediction. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8713−8724
69 B, Cheng M D, Collins Y, Zhu T, Liu T S, Huang H, Adam L C Chen . Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 12472−12482
70 Hou Q, Zhang L, Cheng M M, Feng J. Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4002−4011
71 C, Yu J, Wang C, Peng C, Gao G, Yu N Sang . BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 334−349
72 H, Zhao Y, Zhang S, Liu J, Shi C C, Loy D, Lin J Jia . PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 270−286
73 Y, Yuan L, Huang J, Guo C, Zhang X, Chen J Wang . OCNet: object context network for scene parsing. 2018, arXiv preprint arXiv: 1809.00916
74 T, Xiao Y, Liu B, Zhou Y, Jiang J Sun . Unified perceptual parsing for scene understanding. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 432−448
75 Fu J, Liu J, Wang Y, Li Y, Bao Y, Tang J, Lu H. Adaptive context network for scene parsing. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 6747−6756
76 Y, Huang D, Kang L, Chen X, Zhe W, Jia L, Bao X He . CAR: class-aware regularizations for semantic segmentation. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 518−534
77 Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H. Expectation-maximization attention networks for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 9166−9175
78 Ding H, Jiang X, Shuai B, Liu A Q, Wang G. Semantic correlation promoted shape-variant context for segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 8877−8886
[1] FCS-22563-OF-ZW_suppl_1 Download
[1] Hao WANG, Bin GUO, Mengqi CHEN, Qiuyun ZHANG, Yasan DING, Ying ZHANG, Zhiwen YU. Cascade context-oriented spatio-temporal attention network for efficient and fine-grained video-grounded dialogues[J]. Front. Comput. Sci., 2025, 19(7): 197329-.
[2] Yanpeng SUN, Zechao LI. SSA: semantic structure aware inference on CNN networks for weakly pixel-wise dense predictions without cost[J]. Front. Comput. Sci., 2025, 19(2): 192702-.
[3] Youming GE, Cong HUANG, Yubao LIU, Sen ZHANG, Weiyang KONG. Unsupervised social network embedding via adaptive specific mappings[J]. Front. Comput. Sci., 2024, 18(3): 183310-.
[4] Yanbin JIANG, Huifang MA, Xiaohui ZHANG, Zhixin LI, Liang CHANG. Incorporating metapath interaction on heterogeneous information network for social recommendation[J]. Front. Comput. Sci., 2024, 18(1): 181302-.
[5] Yufei ZENG, Zhixin LI, Zhenbin CHEN, Huifang MA. Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network[J]. Front. Comput. Sci., 2023, 17(6): 176340-.
[6] Shuo TAN, Lei ZHANG, Xin SHU, Zizhou WANG. A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176338-.
[7] Yajing GUO, Xiujuan LEI, Lian LIU, Yi PAN. circ2CBA: prediction of circRNA-RBP binding sites combining deep learning and attention mechanism[J]. Front. Comput. Sci., 2023, 17(5): 175904-.
[8] Xiaojian LIU, Yi ZHU, Xindong WU. Joint user profiling with hierarchical attention networks[J]. Front. Comput. Sci., 2023, 17(3): 173608-.
[9] Zhe XUE, Junping DU, Xin XU, Xiangbin LIU, Junfu WANG, Feifei KOU. Few-shot node classification via local adaptive discriminant structure learning[J]. Front. Comput. Sci., 2023, 17(2): 172316-.
[10] Baiyou QIAO, Zhongqiang WU, Ling MA, Yicheng Zhou, Yunjiao SUN. Effective ensemble learning approach for SST field prediction using attention-based PredRNN[J]. Front. Comput. Sci., 2023, 17(1): 171601-.
[11] Yuxin HUANG, Zhengtao YU, Yan XIANG, Zhiqiang YU, Junjun GUO. Exploiting comments information to improve legal public opinion news abstractive summarization[J]. Front. Comput. Sci., 2022, 16(6): 166333-.
[12] Qingwen LI, Lichao ZHANG, Lei XU, Quan ZOU, Jin WU, Qingyuan LI. Identification and classification of promoters using the attention mechanism based on long short-term memory[J]. Front. Comput. Sci., 2022, 16(4): 164348-.
[13] Yujin CHAI, Yanlin WENG, Lvdi WANG, Kun ZHOU. Speech-driven facial animation with spectral gathering and temporal attention[J]. Front. Comput. Sci., 2022, 16(3): 163703-.
[14] Junsong FAN, Yuxi WANG, He GUAN, Chunfeng SONG, Zhaoxiang ZHANG. Toward few-shot domain adaptation with perturbation-invariant representation and transferable prototypes[J]. Front. Comput. Sci., 2022, 16(3): 163347-.
[15] Mingyang LI, Yuqing XING, Fang KONG, Guodong ZHOU. Towards better entity linking[J]. Front. Comput. Sci., 2022, 16(2): 162308-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed