Please wait a minute...
Frontiers of Engineering Management

ISSN 2095-7513

ISSN 2096-0255(Online)

CN 10-1205/N

Postal Subscription Code 80-905

Front. Eng    2022, Vol. 9 Issue (4) : 653-667    https://doi.org/10.1007/s42524-022-0217-1
RESEARCH ARTICLE
Classifying multiclass relationships between ASes using graph convolutional network
Songtao PENG1, Xincheng SHU2, Zhongyuan RUAN1(), Zegang HUANG1, Qi XUAN1
1. Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
2. Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
 Download: PDF(3767 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Precisely understanding the business relationships between autonomous systems (ASes) is essential for studying the Internet structure. To date, many inference algorithms, which mainly focus on peer-to-peer (P2P) and provider-to-customer (P2C) binary classification, have been proposed to classify the AS relationships and have achieved excellent results. However, business-based sibling relationships and structure-based exchange relationships have become an increasingly nonnegligible part of the Internet market in recent years. Existing algorithms are often difficult to infer due to the high similarity of these relationships to P2P or P2C relationships. In this study, we focus on multiclassification of AS relationship for the first time. We first summarize the differences between AS relationships under the structural and attribute features, and the reasons why multiclass relationships are difficult to be inferred. We then introduce new features and propose a graph convolutional network (GCN) framework, AS-GCN, to solve this multiclassification problem under complex scenes. The proposed framework considers the global network structure and local link features concurrently. Experiments on real Internet topological data validate the effectiveness of our method, that is, AS-GCN. The proposed method achieves comparable results on the binary classification task and outperforms a series of baselines on the more difficult multiclassification task, with an overall metrics above 95%.

Keywords autonomous system      multiclass relationship      graph convolutional network      classification algorithm      Internet topology     
Corresponding Author(s): Zhongyuan RUAN   
Just Accepted Date: 20 September 2022   Online First Date: 02 November 2022    Issue Date: 08 December 2022
 Cite this article:   
Songtao PENG,Xincheng SHU,Zhongyuan RUAN, et al. Classifying multiclass relationships between ASes using graph convolutional network[J]. Front. Eng, 2022, 9(4): 653-667.
 URL:  
https://academic.hep.com.cn/fem/EN/10.1007/s42524-022-0217-1
https://academic.hep.com.cn/fem/EN/Y2022/V9/I4/653
Fig.1  Brief description of the Internet topology and AS relationships.
DatasetTotal linksReal situationIntersectionCoincidence rateIntersection accuracy
AS-RankProbLinkTopoScope
201626563325429725446625853822127283.30%98.13%
201725569424611724975224975521446583.88%97.06%
201826321925455425470025535221428181.97%99.26%
Tab.1  Training dataset established from the intersection of the three methods
DatasetValidation (P2C/P2P)All links(S2S/X2X)
04/01/201444474 (26210/18264)1461671663/1164
04/01/201542512 (26528/15984)1690191906/1881
04/01/201649074 (28214/20860)2215732133/3957
04/01/201749604 (29362/20242)2197132276/2686
04/01/201844868 (30284/14584)2232212760/4467
Tab.2  Validation dataset
Fig.2  Fraction of the misclassification of complex relationships by different algorithms.
Fig.3  Schematic of the aggregating surrounding information process of the GCN model.
Fig.4  Analysis of the distance to clique, assign VP, and common neighbor ratio. (a) Cumulative distribution function (CDF) of absolute distance between ASes to clique for different relationships. (b) Distribution of the number of VPs with a threshold of 110 that can be detected on each relationship’s links. (c) Distribution of different common neighbor ratio for different relationships.
Fig.5  Cumulative distribution function of the distance to VP using different calculation methods on four types of AS relationships.
Fig.6  (a) Hierarchical ratio of the AS-level Internet topology; (b) Distribution of the four AS types between P2P and X2X.
No.FeaturesKey pointsNewly proposed
1DegreeStructure-based
2Transit degreeStructure-based
3Distance to cliqueInternet-based
4Assign VPInternet-based
5Common neighbor ratioStructure-based
6Distance to VP minInternet-based
7Distance to VP maxInternet-based
8Distance to VP meanInternet-based
9Node hierarchyInternet-based
10AS typeInternet-based
Tab.3  Features table
Fig.7  Model framework of AS-GCN.
DatasetTraditional inference methodsFeature-based methodsAS-GCN
AS-RankProbLinkTopoScopeSVM1NNRFXgboostLightGBMMLP
04/01/201696.7596.8096.4461.7481.5093.1494.9291.7477.3097.51
04/01/201794.1995.0893.7762.7081.1082.0491.1088.9078.0097.51
04/01/201897.3796.5596.2173.3174.1079.4692.6586.1869.2097.20
Tab.4  Accuracy of AS link classification methods for binary classification experiment
DatasetLinksMethodsAccuracy (all)PrecisionMacro PrecisionRecallMacro Recall
P2PP2CS2SX2XP2PP2CS2SX2X
04/01/201611093SVM73.7856.0099.0084.0081.0080.0096.0083.0089.0030.0074.50
1NN81.4078.0075.0087.0088.0082.0073.0082.0088.0085.0082.00
RF94.7194.3799.7988.9995.1994.5997.2097.2094.2990.4994.80
Xgboost94.6694.8798.9989.3494.8694.5296.2097.8093.8191.0494.71
LightGBM94.2694.4398.5891.6392.2494.2295.0097.0091.1990.4993.42
MLP66.2455.0077.0076.0060.0067.0065.0064.0094.0049.0068.00
AS-GCN95.3294.3799.6092.3694.6095.2397.2098.8092.1492.8795.25
04/01/201710962SVM73.6056.0099.0082.0077.0078.5094.0081.0089.0030.0073.50
1NN79.5776.0074.0087.0085.0080.5071.0079.0089.0082.0080.25
RF92.6290.1799.1786.6894.6692.6797.2095.2092.7685.8592.75
Xgboost93.2892.6998.3787.2194.7293.2596.4096.4094.1286.7893.43
LightGBM93.3392.3498.3987.8394.4193.2496.4097.6091.4088.0893.37
MLP64.9151.0071.0077.0071.0067.5073.0062.0095.0034.0066.00
AS-GCN95.5495.8599.4093.7693.2895.5797.2099.2091.0394.4295.46
04/01/201813227SVM76.6055.0099.0086.0089.0082.2595.0079.0089.0055.0079.50
1NN82.7775.0077.0091.0087.0082.5072.0083.0087.0087.0082.25
RF92.9891.8093.9691.2394.0892.7794.0096.4088.3093.2692.99
Xgboost93.9693.6995.4591.3594.7793.8295.0096.4089.6294.5793.90
LightGBM92.6991.4996.7891.9991.5792.9694.6096.2084.5394.4692.45
MLP71.8356.0084.0089.0070.0074.7570.0047.0089.0079.0071.25
AS-GCN95.3895.4299.0093.2694.6095.5795.8099.4092.0994.9495.56
Tab.5  Accuracy, Precision, and Recall of the multiclassification experiment
Fig.8  Parameter sensitivity analysis for AS-GCN model.
Fig.9  Feature importance analysis on AS-GCN and its comparison methods.
Features removedAccuracyImportance scoreAverage ranking
Common neighbor ratio91.4515.603.25
Transit degree91.5115.377.75
Distance to VP mean92.7210.764.75
Distance to clique92.8810.142
Degree92.969.843.75
Hierarchy93.019.658.25
AS type93.298.587
Distance to VP min93.308.546
Assign VP94.035.765.75
Distance to VP max94.035.766.50
Tab.6  Feature importance analysis
Fig.10  Visualization of multiclassification obtained by AS-GCN using t-SNE algorithm.
1 M ApostolakiG MartiJ MüllerL Vanbever (2018). SABRE: Protecting Bitcoin against routing attacks. arXiv preprint. arXiv:1808.06254
2 S Arber, J J Hunter, J Ross Jr, M Hongo, G Sansig, J Borg, J C Perriard, K R Chien, P Caroni, (1997). MLP-deficient mice exhibit a disruption of cardiac cytoarchitectural organization, dilated cardiomyopathy, and heart failure. Cell, 88( 3): 393–403
https://doi.org/10.1016/S0092-8674(00)81878-4 pmid: 9039266
3 T BöttgerG AntichiE L FernandesLallo R diM BruyereS UhligI Castro (2018). The elusive Internet flattening: 10 years of IXP growth. arXiv preprint. arXiv:1810.10963
4 L Breiman, (2001). Random forests. Machine Learning, 45( 1): 5–32
https://doi.org/10.1023/A:1010933404324
5 S H B Brito, M A S Santos, R dos Reis Fontes, D A L Perez, H D L da Silva, C R E Rothenberg, (2016). An analysis of the largest national ecosystem of public Internet exchange points: The case of Brazil. Journal of Communication and Information Systems, 31( 1): 256–271
https://doi.org/10.14209/jcis.2016.23
6 S Carmi, S Havlin, S Kirkpatrick, Y Shavitt, E Shir, (2007). A model of Internet topology using k-shell decomposition. Proceedings of the National Academy of Sciences of the United States of America, 104( 27): 11150–11154
https://doi.org/10.1073/pnas.0701175104 pmid: 17586683
7 I Castro, J C Cardona, S Gorinsky, P Francois, (2014). Remote peering: More peering without Internet flattening. In: Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies. Sydney: Association for Computing Machinery, 185–198
8 T ChenC (2016) Guestrin. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, CA: Association for Computing Machinery, 785–794
9 S Cho, R Fontugne, K Cho, A Dainotti, P Gill, (2019). BGP hijacking classification. In: Network Traffic Measurement and Analysis Conference (TMA). Paris: IEEE, 25–32
10 A Cohen, Y Gilad, A Herzberg, M Schapira, (2016). Jumpstarting BGP security with path-end validation. In: Proceedings of the ACM SIGCOMM Conference. Florianopolis: Association for Computing Machinery, 342–355
11 C Cortes, V Vapnik, (1995). Support-vector networks. Machine Learning, 20( 3): 273–297
https://doi.org/10.1007/BF00994018
12 A Dhamdhere, D D Clark, A Gamero-Garrido, M Luckie, R K P Mok, G Akiwate, K Gogia, V Bajpai, A C Snoeren, K Claffy, (2018). Inferring persistent interdomain congestion. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. Budapest: Association for Computing Machinery, 1–15
13 Battista G DiM PatrignaniM (2003) Pizzonia. Computing the types of the relationships between autonomous systems. In: IEEE INFOCOM. 22nd Annual Joint Conference of the IEEE Computer and Communications Societies. San Francisco, CA: IEEE, 156–165
14 X Dimitropoulos, D Krioukov, M Fomenkov, B Huffaker, Y Hyun, K C Claffy, G Riley, (2007). AS relationships: Inference and validation. Computer Communication Review, 37( 1): 29–40
https://doi.org/10.1145/1198255.1198259
15 J Friedman, T Hastie, R Tibshirani, (2000). Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Annals of Statistics, 28( 2): 337–407
https://doi.org/10.1214/aos/1016218223
16 L Gao, (2001). On inferring autonomous system relationships in the Internet. IEEE/ACM Transactions on Networking, 9( 6): 733–745
https://doi.org/10.1109/90.974527
17 P GillM ArlittZ LiA (2008) Mahanti. The flattening Internet topology: Natural evolution, unsightly barnacles or contrived collapse? In: International Conference on Passive and Active Network Measurement. Cleveland, OH: Springer, 1–10
18 P Gill, M Schapira, S Goldberg, (2011). Let the market drive deployment: A strategy for transitioning to BGP security. Computer Communication Review, 41( 4): 14–25
https://doi.org/10.1145/2043164.2018439
19 V GiotsasM LuckieB HuffakerK (2014) Claffy. Inferring complex AS relationships. In: Proceedings of the Conference on Internet Measurement Conference. Vancouver, BC: Association for Computing Machinery, 23–30
20 V GiotsasS (2012) Zhou. Valley-free violation in Internet routing: Analysis based on BGP community data. In: IEEE International Conference on Communications (ICC). Ottawa, ON: IEEE, 1193–1197
21 E Gregori, A Improta, L Lenzini, L Rossi, L Sani, (2011). BGP and inter-AS economic relationships. In: International Conference on Research in Networking. Valencia: Springer, 54–67
22 Y JinC ScottA DhamdhereV GiotsasA KrishnamurthyS (2019) Shenker. Stable and practical AS relationship inference with ProbLink. In: Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. Boston, MA: USENIX Association, 581–597
23 Z JinX ShiY YangX YinZ WangJ (2020) Wu. TopoScope: Recover AS relationships from fragmentary observations. In: Proceedings of the ACM Internet Measurement Conference. New York, NY: Association for Computing Machinery, 266–280
24 J Karlin, S Forrest, J Rexford, (2008). Autonomous security for autonomous systems. Computer Networks, 52( 15): 2908–2923
https://doi.org/10.1016/j.comnet.2008.06.012
25 E Katz-BassettD R ChoffnesÍ CunhaC ScottT AndersonA (2011) Krishnamurthy. Machiavellian routing: Improving Internet availability with BGP poisoning. In: Proceedings of the 10th ACM Workshop on Hot Topics in Networks. Cambridge, MA: Association for Computing Machinery, 1–6
26 G KeQ MengT FinleyT WangW ChenW MaQ YeT Y (2017) Liu. LightGBM: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc., 3149–3157
27 D P KingmaJ Ba (2014). Adam: A method for stochastic optimization. arXiv preprint. arXiv:1412.6980
28 T N KipfM Welling (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint. arXiv:1609.02907
29 C Labovitz, S Iekel-Johnson, D McPherson, J Oberheide, F Jahanian, (2010). Internet inter-domain traffic. Computer Communication Review, 40( 4): 75–86
https://doi.org/10.1145/1851275.1851194
30 M Luckie, B Huffaker, A Dhamdhere, V Giotsas, K Claffy, (2013). AS relationships, customer cones, and validation. In: Proceedings of the Conference on Internet Measurement Conference. Barcelona: Association for Computing Machinery, 243–256
31 R Motamedi, R Rejaie, W Willinger, (2015). A survey of techniques for Internet topology discovery. IEEE Communications Surveys and Tutorials, 17( 2): 1044–1065
https://doi.org/10.1109/COMST.2014.2376520
32 C OrsiniA KingD GiordanoV GiotsasA (2016) Dainotti. BGPStream: A software framework for live and historical BGP data analysis. In: Proceedings of the Internet Measurement Conference. Santa Monica, CA: Association for Computing Machinery, 429–444
33 A PaszkeS GrossS ChintalaG ChananE YangZ DeVitoZ LinA DesmaisonL AntigaA (2017) Lerer. Automatic differentiation in PyTorch. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc., 1–4
34 L E Peterson, (2009). K-nearest neighbor. Scholarpedia, 4( 2): 1883
https://doi.org/10.4249/scholarpedia.1883
35 S Y Qiu, P D McDaniel, F Monrose, (2007). Toward valley-free inter-domain routing. In: IEEE International Conference on Communications. Glasgow: IEEE, 2009–2016
36 T Shapira, Y Shavitt, (2020). Unveiling the type of relationship between autonomous systems using deep learning. In: IEEE/IFIP Network Operations and Management Symposium. Budapest: IEEE, 1–6
37 J M SmithM (2018) Schuchard. Routing around congestion: Defeating DDoS attacks and adverse network conditions via reactive BGP routing. In: IEEE Symposium on Security and Privacy (SP). San Francisco, CA: IEEE, 599–617
38 J Susan VargheseL Ruan (2015). A machine learning approach to edge type prediction in Internet AS graphs. Online Paper
39 S Sundaresan, X Deng, Y Feng, D Lee, A Dhamdhere, (2017). Challenges in inferring Internet congestion using throughput measurements. In: Proceedings of the Internet Measurement Conference. London: Association for Computing Machinery, 43–56
40 M E Tozal, (2018). Policy-preferred paths in AS-level Internet topology graphs. Theory and Applications of Graphs, 5( 1): 3
https://doi.org/10.20429/tag.2018.050103
41 L van der Maaten, G Hinton, (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9( 86): 2579–2605
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed