Learning from shortcut: a shortcut-guided approach for explainable graph learning

doi:10.1007/s11704-024-40452-4

Front. Comput. Sci.

2025, Vol. 19

Issue (8) : 198338 https://doi.org/10.1007/s11704-024-40452-4

Artificial Intelligence

Learning from shortcut: a shortcut-guided approach for explainable graph learning

Linan YUE¹, Qi LIU^1,²(

), Ye LIU¹, Weibo GAO¹, Fangzhou YAO¹

¹. State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei 230026, China
². Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230026, China

Download: PDF(2435 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

The remarkable success in graph neural networks (GNNs) promotes the explainable graph learning methods. Among them, the graph rationalization methods draw significant attentions, which aim to provide explanations to support the prediction results by identifying a small subset of the original graph (i.e., rationale). Although existing methods have achieved promising results, recent studies have proved that these methods still suffer from exploiting shortcuts in the data to yield task results and compose rationales. Different from previous methods plagued by shortcuts, in this paper, we propose a Shortcut-guided Graph Rationalization (SGR) method, which identifies rationales by learning from shortcuts. Specifically, SGR consists of two training stages. In the first stage, we train a shortcut guider with an early stop strategy to obtain shortcut information. During the second stage, SGR separates the graph into the rationale and non-rationale subgraphs. Then SGR lets them learn from the shortcut information generated by the frozen shortcut guider to identify which information belongs to shortcuts and which does not. Finally, we employ the non-rationale subgraphs as environments and identify the invariant rationales which filter out the shortcuts under environment shifts. Extensive experiments conducted on synthetic and real-world datasets provide clear validation of the effectiveness of the proposed SGR method, underscoring its ability to provide faithful explanations.

Keywords explainable graph learning graph rationalization shortcut learning

Corresponding Author(s): Qi LIU

Just Accepted Date: 19 August 2024 Issue Date: 25 October 2024

Cite this article:

Linan YUE,Qi LIU,Ye LIU, et al. Learning from shortcut: a shortcut-guided approach for explainable graph learning[J]. Front. Comput. Sci., 2025, 19(8): 198338.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-40452-4
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I8/198338

Fig.1 An example of the motif type prediction, where the Cycle and House are motif labels, and Tree and Wheel are bases that are irrelevant to the motif prediction. In the training dataset, the data distribution is Cycle with Tree and House with Wheel. When the model depends too much on this data distribution (i.e., shortcuts) for prediction, the model is likely to misclassify when facing the test dataset with a shift in the distribution

Fig.2 Architecture of SGR in the second stage, including the selector, predictor and the freezed shortcut guider

Tab.1 Statistics of Spurious-Motif and Graph-SST2 datasets

Tab.2 Statistics of OGBG datasets

Fig.3 Training losses and accuracy on balance and bias examples with different datasets. (a) Cycle-Tree; (b) Cycle-Ladder; (c) Cycle-Wheel

Fig.4 Accuracy of GIN and GCN on the unbiased and biased test set. GIN and GCN achieve promising results on the biased test set, but perform badly on the unbiased test set

Fig.5 Performance of SGR with different shortcut guiders that are trained with the early stop strategy

Tab.3 The graph classification ROC-AUC on testing datasets of OGBG

Tab.4 The graph classification ACC on testing datasets of the Spurious-Motif and Graph-SST2

Fig.6 The results of identifying the ground-truth rationale subgraphs on Spurious-Motif. (a) Precision@5 on Spurious-Motif with GIN as the graph encoder; (b) Precision@5 on Spurious-Motif with GCN as the graph encoder

Fig.7 Ablation studies of SGR with GIN over the OGBG dataset

		MolHIV	MolToxCast	MolBACE	MolBBBP	MolSIDER
GIN is the backbone	DisC	0.7731	0.6662	0.8293	0.6963	0.5846
	DisC+SGR	0.7883 ( $↑$ 1.52%)	0.6703 ( $↑$ 0.41%)	0.8343 ( $↑$ 0.50%)	0.6991 ( $↑$ 0.28%)	0.5969 ( $↑$ 1.23%)
	RGDA	0.7714	0.6694	0.8187	0.6953	0.5864
	RGDA+SGR	0.7878 ( $↑$ 1.64%)	0.6775 ( $↑$ 0.81%)	0.8256 ( $↑$ 0.69%)	0.6970 ( $↑$ 0.17%)	0.5938 ( $↑$ 0.74%)
	CAL	0.7339	0.6476	0.7848	0.6582	0.5965
	CAL+SGR	0.7699 ( $↑$ 3.59%)	0.6582 ( $↑$ 1.06%)	0.8114 ( $↑$ 2.66%)	0.6883 ( $↑$ 2.93%)	0.6021 ( $↑$ 0.56%)
	DARE	0.7836	0.6677	0.8239	0.6820	0.5921
	DARE+SGR	0.7901 ( $↑$ 0.65%)	0.6698 ( $↑$ 0.21%)	0.8296 ( $↑$ 0.57%)	0.6947 ( $↑$ 1.27%)	0.5998 ( $↑$ 0.77%)
GCN is the backbone	DisC	0.7791	0.6626	0.8104	0.7061	0.6110
	DisC+SGR	0.7813 ( $↑$ 0.22%)	0.6691 ( $↑$ 0.65%)	0.8197 ( $↑$ 0.93%)	0.7098 ( $↑$ 0.37%)	0.6189 ( $↑$ 0.79%)
	RGDA	0.7816	0.6622	0.8044	0.6970	0.6133
	RGDA+SGR	0.7856 ( $↑$ 0.40%)	0.6688 ( $↑$ 0.66%)	0.8193 ( $↑$ 1.49%)	0.7078 ( $↑$ 1.08%)	0.6193 ( $↑$ 0.60%)
	CAL	0.7501	0.6006	0.7802	0.6635	0.5559
	CAL+SGR	0.7737 ( $↑$ 2.36%)	0.6414 ( $↑$ 4.08%)	0.7936 ( $↑$ 1.34%)	0.6849 ( $↑$ 2.14%)	0.5976 ( $↑$ 4.17%)
	DARE	0.7523	0.6618	0.8066	0.6823	0.6192
	DARE+SGR	0.7748 ( $↑$ 2.25%)	0.6704 ( $↑$ 0.86%)	0.8146 ( $↑$ 0.80%)	0.7076 ( $↑$ 2.53%)	0.6211 ( $↑$ 0.19%)

Tab.5 Generalizability of the “learning from shortcuts” framework. Each rationalization method implemented with SGR is highlighted with a gray background

Fig.8 Visualization of rationale subgraphs identified by different methods that are trained with the Spurious-Motif dataset Cycle-Tree. (a) SGR; (b) RGDA; (c) DIR; (d) GSAT

Fig.9 Visualization of SGR rationale subgraphs, where the rationale tokens are highlighted by navy blue colors and the red lines indicate the edges between two identified rationale tokens. Among them, each graph represents a sentiment comment with positive/negative label (e.g., the positive comment “said the film was better than saving private ryan” in (a)). (a) Training rationale: Positive sentiment; (b) training rationale: negative sentiment; (c) testing rationale: Positive sentiment; (d) testing rationale: negative sentiment

Fig.10 Visualization of SGR rationale subgraphs, where the selected rationale nodes are highlighted by navy blue colors and the red lines indicate the edges between two identified rationale nodes. Among them, each graph consists of the motif type (Cycle) and bases (Tree, Wheel and Ladder). (a) Cycle-Tree; (b) Cycle-Wheel; (c) Cycle-Ladder

Fig.11 Visualization of SGR rationale subgraphs, where each graph consists of the motif type (House) and bases (Tree, Wheel and Ladder). (a) House-Tree; (b) House-Wheel; (c) House-Ladder

Fig.12 Visualization of SGR rationale subgraphs, where each graph consists of the motif type (Crane) and bases (Tree, Wheel and Ladder). (a) Crane-Tree; (b) Crane-Wheel; (c) Crane-Ladder

1	T N, Kipf M Welling . Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017
2	Z, Wu Y, Gan T, Xu F Wang . Graph-segmenter: graph transformer with boundary-aware attention for semantic segmentation. Frontiers of Computer Science, 2024, 18( 5): 185327
3	Y, Liang Q, Song Z, Zhao H, Zhou M Gong . BA-GNN: behavior-aware graph neural network for session-based recommendation. Frontiers of Computer Science, 2023, 17( 6): 176613
4	Y, Wu H, Huang Y, Song H Jin . Soft-GNN: towards robust graph neural networks via self-adaptive data utilization. Frontiers of Computer Science, 2025, 19( 4): 194311
5	W, Hu M, Fey M, Zitnik Y, Dong H, Ren B, Liu M, Catasta J Leskovec . Open graph benchmark: datasets for machine learning on graphs. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1855
6	Z, Guo C, Zhang W, Yu J, Herr O, Wiest M, Jiang N V Chawla . Few-shot graph learning for molecular property prediction. In: Proceedings of the Web Conference 2021. 2021, 2559−2567
7	G, Yehudai E, Fetaya E A, Meirom G, Chechik H Maron . From local structures to size generalization in graph neural networks. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11975−11986
8	R, Ying D, Bourgeois J, You M, Zitnik J Leskovec . GNNExplainer: generating explanations for graph neural networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 829
9	D, Luo W, Cheng D, Xu W, Yu B, Zong H, Chen X Zhang . Parameterized explainer for graph neural network. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1646
10	T, Lei R, Barzilay T Jaakkola . Rationalizing neural predictions. In: Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. 2016, 107−117
11	X, Wang Y X, Wu A, Zhang X, He T S Chua . Towards multi-grained explainability for graph neural networks. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2024, 1410
12	S, Chang Y, Zhang M, Yu T S Jaakkola . Invariant rationalization. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1448−1458
13	Y, Wu X, Wang A, Zhang X, He T S Chua . Discovering invariant rationales for graph neural networks. In: Proceedings of the 10th International Conference on Learning Representations. 2022
14	S, Fan X, Wang Y, Mo C, Shi J Tang . Debiasing graph neural networks via learning disentangled causal substructure. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1808
15	Y, Sui X, Wang J, Wu M, Lin X, He T S Chua . Causal attention for interpretable and generalizable graph classification. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1696−1705
16	H, Li Z, Zhang X, Wang W Zhu . Learning invariant graph representations for out-of-distribution generalization. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 859
17	C, Clark M, Yatskar L Zettlemoyer . Don’t take the easy way out: ensemble based methods for avoiding known dataset biases. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 4067−4080
18	J, Nam H, Cha S, Ahn J, Lee J Shin . Learning from failure: training debiased classifier from biased classifier. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1736
19	Y, Li X, Lyu N, Koren L, Lyu B, Li X Ma . Anti-backdoor learning: training clean models on poisoned data. In: Proceedings of the 34th Advances in Neural Information Processing Systems. 2021, 14900−14912
20	D, Arpit S, Jastrzębski N, Ballas D, Krueger E, Bengio M S, Kanwal T, Maharaj A, Fischer A, Courville Y, Bengio S Lacoste-Julien . A closer look at memorization in deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 233−242
21	B, Poole S, Ozair Den Oord A, Van A A, Alemi G Tucker . On variational bounds of mutual information. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5171−5180
22	P, Cheng W, Hao S, Dai J, Liu Z, Gan L Carin . CLUB: a contrastive log-ratio upper bound of mutual information. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 166
23	L, Yue Q, Liu Y, Du Y, An L, Wang E Chen . DARE: disentanglement-augmented rationale extraction. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1929
24	den Oord A, van Y, Li O Vinyals . Representation learning with contrastive predictive coding. 2018, arXiv preprint arXiv: 1807.03748
25	J, Luo M, He W, Pan Z Ming . BGNN: behavior-aware graph neural network for heterogeneous session-based recommendation. Frontiers of Computer Science, 2023, 17( 5): 175336
26	S, Xiao T, Bai X, Cui B, Wu X, Meng B Wang . A graph-based contrastive learning framework for medicare insurance fraud detection. Frontiers of Computer Science, 2023, 17( 2): 172341
27	M S, Schlichtkrull Cao N, De I Titov . Interpreting graph neural networks for NLP with differentiable edge masking. In: Proceedings of the 9th International Conference on Learning Representations. 2021
28	P, Velickovic G, Cucurull A, Casanova A, Romero P, Liò Y Bengio . Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations. 2018
29	Y, Chen Y, Zhang Y, Bian H, Yang K, Ma B, Xie T, Liu B, Han J Cheng . Learning causally invariant representations for out-of-distribution generalization on graphs. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1608
30	H, Li X, Wang Z, Zhang W Zhu . Out-of-distribution generalization on graphs: a survey. 2022, arXiv preprint arXiv: 2202.07987
31	N, Yang K, Zeng Q, Wu X, Jia J Yan . Learning substructure invariance for out-of-distribution molecular representations. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 942
32	F, Wang Q, Liu E, Chen Z, Huang Y, Yin S, Wang Y Su . NeuralCD: a general framework for cognitive diagnosis. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 8): 8312–8327
33	G, Liu T, Zhao J, Xu T, Luo M Jiang . Graph rationalization with environment-based augmentations. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1069−1078
34	N, Tishby F C, Pereira W Bialek . The information bottleneck method. 2000, arXiv preprint arXiv: physics/0004057
35	A A, Alemi I, Fischer J V, Dillon K Murphy . Deep variational information bottleneck. In: Proceedings of the 5th International Conference on Learning Representations. 2017
36	B, Paranjape M, Joshi J, Thickstun H, Hajishirzi L Zettlemoyer . An information bottleneck approach for controlling conciseness in rationale extraction. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 1938−1952
37	T, Wu H, Ren P, Li J Leskovec . Graph information bottleneck. In: Proceedings of the 34th Advances in Neural Information Processing Systems. 2020, 20437−20448
38	J, Yu T, Xu Y, Rong Y, Bian J, Huang R He . Graph information bottleneck for subgraph recognition. In: Proceedings of the 9th International Conference on Learning Representations. 2021
39	S, Miao M, Liu P Li . Interpretable and generalizable graph learning via stochastic attention mechanism. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 15524−15543
40	R, Geirhos J H, Jacobsen C, Michaelis R, Zemel W, Brendel M, Bethge F A Wichmann . Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020, 2( 11): 665–673
41	M, Du F, He N, Zou D, Tao X Hu . Shortcut learning of large language models in natural language understanding: a survey. 2022, arXiv preprint arXiv: 2208.11857
42	L, Yue Q, Liu L, Wang Y, An Y, Du Z Huang . Interventional rationalization. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 11404−11418
43	L, Yue Q, Liu Y, Du L, Wang W, Gao Y An . Towards faithful explanations: Boosting rationalization with shortcuts discovery. In: Proceedings of the 12th International Conference on Learning Representations. 2024
44	A, Rashid V, Lioutas M Rezagholizadeh . MATE-KD: Masked adversarial TExt, a companion to knowledge distillation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 1062−1071
45	J, Stacey P, Minervini H, Dubossarsky S, Riedel T Rocktäschel . Avoiding the hypothesis-only bias in natural language inference via ensemble adversarial training. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 8281−8291
46	M, Arjovsky L, Bottou I, Gulrajani D Lopez-Paz . Invariant risk minimization. 2019, arXiv preprint arXiv: 1907.02893
47	V, Sanh T, Wolf Y, Belinkov A M Rush . Learning from others’ mistakes: Avoiding dataset biases without modeling them. In: Proceedings of the 9th International Conference on Learning Representations. 2021
48	K, Xu W, Hu J, Leskovec S Jegelka . How powerful are graph neural networks? In: Proceedings of the 7th International Conference on Learning Representations. 2019
49	G, Liu E, Inae T, Luo M Jiang . Rationalizing graph neural networks with data augmentation. ACM Transactions on Knowledge Discovery from Data, 2024, 18( 4): 86
50	R, Socher A, Perelygin J, Wu J, Chuang C D, Manning A, Ng C Potts . Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. 2013, 1631−1642
51	M, Yu S, Chang Y, Zhang T Jaakkola . Rethinking cooperative rationalization: Introspective extraction and complement control. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 4094−4103
52	R, Sun H, Tao Y, Chen Q Liu . HACAN: a hierarchical answer-aware and context-aware network for question generation. Frontiers of Computer Science, 2024, 18( 5): 185321
53	D P, Kingma J Ba . Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[1]

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed