Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (1) : 181701    https://doi.org/10.1007/s11704-022-2078-5
Image and Graphics
VSAN: A new visualization method for super-large-scale academic networks
Qi LI1, Xingli WANG2, Luoyi FU2, Xinde CAO3, Xinbing WANG1(), Jing ZHANG4, Chenghu ZHOU5
1. Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
3. School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
4. School of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, China
5. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
 Download: PDF(7376 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

As a carrier of knowledge, papers have been a popular choice since ancient times for documenting everything from major historical events to breakthroughs in science and technology. With the booming development of science and technology, the number of papers has been growing exponentially. Just like the fact that Internet of Things (IoT) allows the world to be connected in a flatter way, how will the network formed by massive academic papers look like? Most existing visualization methods can only handle up to hundreds of thousands of node size, which is much smaller than that of academic networks which are usually composed of millions or even more nodes. In this paper, we are thus motivated to break this scale limit and design a new visualization method particularly for super-large-scale academic networks (VSAN). Nodes can represent papers or authors while the edges means the relation (e.g., citation, coauthorship) between them. In order to comprehensively improve the visualization effect, three levels of optimization are taken into account in the whole design of VSAN in a progressive manner, i.e., bearing scale, loading speed, and effect of layout details. Our main contributions are two folded: 1) We design an equivalent segmentation layout method that goes beyond the limit encountered by state-of-the-arts, thus ensuring the possibility of visually revealing the correlations of larger-scale academic entities. 2) We further propose a hierarchical slice loading approach that enables users to observe the visualized graphs of the academic network at both macroscopic and microscopic levels, with the ability to quickly zoom between different levels. In addition, we propose a “jumping between nebula graphs” method that connects the static pages of many academic graphs and helps users to form a more systematic and comprehensive understanding of various academic networks. Applying our methods to three academic paper citation datasets in the AceMap database confirms the visualization scalability of VSAN in the sense that it can visualize academic networks with more than 4 million nodes. The super-large-scale visualization not only allows a galaxy-like scholarly picture unfolding that were never discovered previously, but also returns some interesting observations that may drive extra attention from scientists.

Keywords academic networks      large graph visualization      graph layout      graph loading     
Corresponding Author(s): Xinbing WANG   
About author:

Changjian Wang and Zhiying Yang contributed equally to this work.

Just Accepted Date: 08 November 2022   Issue Date: 02 March 2023
 Cite this article:   
Qi LI,Xingli WANG,Luoyi FU, et al. VSAN: A new visualization method for super-large-scale academic networks[J]. Front. Comput. Sci., 2024, 18(1): 181701.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2078-5
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I1/181701
Fig.1  Overall framework of VSAN. The segmentation layout method helps to get the layout of the super-large-scale academic network. The hierarchical slice loading approach enables fast interactive presentation of the network. In addition, users are able to jump between different pages of networks by clicking nodes in nebula graphs
Fig.2  Schematic diagram of the grid division after graph layout
Fig.3  Overlap removal of slices. We can see that after applying the NoverlapLayout algorithm, nodes are not overlapping anymore while the shape of the graph is kept
Dataset Nodes Edges Description
Feather-lastfm-social 7,624 27,806 Social network of LastFM users from Asia
Musae-facebook 22,470 171,002 Facebook page-page network with page names
Soc-Slashdot0811 77,360 905,468 Slashdot social network from November 2008
Com-Amazon 334,863 925,872 Amazon product network
Nature 2,053,310 3,426,847 Nature citation relationship data including 19 fields
DBLP 4,328,431 36,032,774 Citation relationship data in the field of computer science
Galaxy ? ? 47,310 academic networks led by highly cited papers
Tab.1  Datasets used in experiments
Dataset VSAN ForceAtlas2 FMMM Pivot MDS GRIP FR KK
Feather-lastfm-social 0.2374 0.1999 0.1197 0.1206 0.0543 0.0169 0.0163
Musae-facebook 0.2946 0.2092 0.1178 0.0893 0.0479 0.0058 0.0061
Soc-Slashdot0811 0.0713 0.0525 0.0509 0.0631 0.0381 0.0349 ?
Com-Amazon 0.3639 ? 0.0583 0.0854 0.0324 ? ?
Nature 0.3748 ? 0.1327 0.2563 ? ? ?
DBLP 0.0261 ? ? ? ? ? ?
Tab.2  Layout quality of different algorithms
Dataset VSAN ForceAtlas2 FMMM Pivot MDS GRIP FR KK
Feather-lastfm-social 6.09 6.48 4.70 1.01 2.28 9.02 58.03
Musae-facebook 17.78 106.20 9.44 3.16 17.83 53.47 899.95
Soc-Slashdot0811 152.37 3765.30 41.04 22.07 180.52 703.52 ?
Com-Amazon 207.17 ? 162.64 47.77 1350.34 ? ?
Nature 3304.79 ? 818.58 558.72 ? ? ?
DBLP 2128.80 ? ? ? ? ? ?
Tab.3  Layout time for different algorithms (s)
Dataset Graph segmentation Subgraph layout Structure equivalence Equivalent structure layout Stitching
Feather-lastfm-social 1.31 3.69 0.13 0.72 0.25
Musae-facebook 5.42 9.72 0.61 0.86 1.17
Soc-Slashdot0811 20.52 126.20 1.66 1.39 2.60
Com-Amazon 94.11 96.07 4.80 1.93 10.26
Nature 368.76 2775.98 108.84 0.76 50.46
DBLP ? 1795.84 165.47 80.54 86.95
Tab.4  Running time for main steps of VSAN (s)
Leading article of nebula graph Nodes Edges SVG loading Hierarchical slice loading
3D Convolutional Neural Networks for Human Action Recognition 1600 8252 1.66 1.10
Mastering the game of Go with deep neural networks and tree search 3035 7332 1.90
Reducing the dimensionality of data with neural networks 6781 33618 1.96
ImageNet: A large-scale hierarchical image database 8807 54767 2.65
Deep Residual Learning for Image Recognition 15405 84777 4.54
Diffusion of Innovations 34636 151439 6.88
The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations 53802 254402 11.52
Statistical Power Analysis for the Behavioral Sciences 94727 341637 17.83
Tab.5  Loading time for SVG and VSAN (s)
Fig.4  Visualization of Nature citation relationship by VSAN. (a) Inter-block layout; (b) layout of some of the subgraphs (communities); (c) the ultimate academic network layout. In (b), communities are clustered in different structures. Some communities have multiple cores, such as community_4, community_5, and community_8; some communities have only one core, e.g., community_1
Fig.5  The layout of DBLP based on clustering of academic conferences or journals by VSAN. (a) the inter-cluster galaxy structure; (b) the macro overview of the distribution of conferences and journals in the field of computer science; (c) distribution of conferences and journals related to computer vision; (d) distribution of conferences and journals related to computer networks; (e) distribution of nodes inside TIT, a top journal in the information theory field
Fig.6  Subgraph layout effects with different layout algorithms. The first row ((a) ? (g)) shows the layout of the same community from the musae-facebook dataset under different layout algorithms, while the community in the second row ((h) ? (n)) is from the Nature dataset. The layout algorithm for each column is ForceAtlas2 with overlap removal ((a), (h)), ForceAtlas2 ((b), (i)), FMMM ((c), (j)), Pivot MDS ((d), (k)), GRIP ((e), (l)), FR ((f), (m)), and KK ((g), (n)), respectively
Fig.7  The process of gradual zooming in the Nature citation relationship network using VSAN. Zooming in the network can be accomplished by scrolling the mouse wheel or clicking on the zoom controls, enabling a more and more detailed view of the enormous network
Fig.8  The visualization method of super-large-scale academic networks based on jumping between nebula graphs by VSAN. (a) Navigation page of nebula graph of computer science; (b) jumping between nebula graphs by the navigation page
Fig.9  Examples of 6 typical nebula graphs
  
  
  
  
  
  
  
  Fig.A1 Coordinate transformation diagram.
  Fig.A2 Schematic diagram of node, label and edge division and the determination of slice number. In (a), (b) and (c), the light blue square indicates the range of the slice. In (a), circles 1-4 indicate the drawing ranges of the nodes. Information of nodes 1?3 will be stored in the set corresponding to that slice, while information of node 4 will not be stored in it, because the drawing range of node 4 does not intersect with the slice range. Similarly, information of labels 2-3 and edges 1-2 will be stored in the set corresponding to that slice, while information of label 1 and edge 3 will not be stored in it. In (d), (e) and (f), slices in yellow will store the information of the corresponding node, label or edge
  Fig.A3 The extreme values on the curve are not taken at the beginning or end of the edge
1 J W, Weis J M Jacobson . Learning on knowledge graph dynamics provides an early warning of impactful research. Nature Biotechnology, 2021, 39( 10): 1300–1307
2 T, Ebesu Y Fang . Neural citation network for context-aware citation recommendation. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, 1093−1096
3 R, Guns R Rousseau . Recommending research collaborations using link prediction and random forest classifiers. Scientometrics, 2014, 101( 2): 1461–1473
4 W, Wang S, Yu T M, Bekele X, Kong F Xia . Scientific collaboration patterns vary with scholars’ academic ages. Scientometrics, 2017, 112( 1): 329–343
5 T, Amjad Y, Ding J, Xu C, Zhang A, Daud J, Tang M Song . Standing on the shoulders of giants. Journal of Informetrics, 2017, 11( 1): 307–323
6 M, Jacomy T, Venturini S, Heymann M Bastian . ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS One, 2014, 9( 6): e98679
7 T M J, Fruchterman E M Reingold . Graph drawing by force-directed placement. Software: Practice and Experience, 1991, 21( 11): 1129–1164
8 Y Hu . Efficient, high-quality force-directed graph drawing. The Mathematica Journal, 2005, 10( 1): 37–71
9 J, Tang J, Liu M, Zhang Q Mei . Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web. 2016, 287−297
10 B, Saket A, Endert J Stasko . Beyond usability and performance: a review of user experience-focused evaluations in visualization. In: Proceedings of the 6th Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization. 2016, 133−142
11 Z, Liu J Heer . The effects of interactive latency on exploratory visual analysis. IEEE Transactions on Visualization and Computer Graphics, 2014, 20( 12): 2122–2131
12 Y, Park M, Cafarella B Mozafari . Visualization-aware sampling for very large databases. In: Proceedings of the 32nd IEEE International Conference on Data Engineering (ICDE). 2016, 755−766
13 Z, Tan C, Liu Y, Mao Y, Guo J, Shen X Wang . AceMap: a novel approach towards displaying relationship among academic literatures. In: Proceedings of the 25th International Conference Companion on World Wide Web. 2016, 437−442
14 Landesberger T, Von A, Kuijper T, Schreck J, Kohlhammer Wijk J J, van J D, Fekete D W Fellner . Visual analysis of large graphs: state-of-the-art and future research challenges. Computer Graphics Forum, 2011, 30( 6): 1719–1749
15 Y, Hu L Shi . Visualizing large graphs. WIREs Computational Statistics, 2015, 7( 2): 115–136
16 Y, Jia J, Hoberock M, Garland J Hart . On the visualization of social and other scale-free networks. IEEE Transactions on Visualization and Computer Graphics, 2008, 14( 6): 1285–1292
17 E R, Gansner Y, Hu S, North C Scheidegger . Multilevel agglomerative edge bundling for visualizing large graphs. In: Proceedings of 2011 IEEE Pacific Visualization Symposium. 2011, 187−194
18 V, Batagelj F J, Brandenburg W, Didimo G, Liotta P, Palladino M Patrignani . Visual analysis of large graphs using (X, Y)-clustering and hybrid visualizations. IEEE Transactions on Visualization and Computer Graphics, 2011, 17( 11): 1587–1598
19 N, Bikakis G, Papastefanatos M, Skourla T Sellis . A hierarchical aggregation framework for efficient multilevel visual exploration and analysis. Semantic Web, 2017, 8( 1): 139–179
20 D, Cheng P, Schretlen N, Kronenfeld N, Bozowsky W Wright . Tile based visual analytics for twitter big data exploratory analysis. In: Proceedings of 2013 IEEE International Conference on Big Data. 2013, 2−4
21 Z, Liu B, Jiang J Heer . imMens: real-time visual querying of big data. Computer Graphics Forum, 2013, 32( 3pt4): 421–430
22 J D, Mackinlay R, Rao S K Card . An organic user interface for searching citation links. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1995, 67−73
23 N, Elmqvist P Tsigas . CiteWiz: a tool for the visualization of scientific citation networks. Information Visualization, 2007, 6( 3): 215–232
24 L, Shi H, Tong J, Tang C Lin . VEGAS: visual influEnce GrAph summarization on citation networks. IEEE Transactions on Knowledge and Data Engineering, 2015, 27( 12): 3417–3431
25 M, Jing X, Li Y Hu . Interactive temporal visualization of collaboration networks. In: Proceedings of the 18th Pacific-Rim Conference on Multimedia on Advances in Multimedia Information Processing – PCM 2017. 2017, 713−722
26 R, Nakazawa T, Itoh T Saito . Analytics and visualization of citation network applying topic-based clustering. Journal of Visualization, 2018, 21( 4): 681–693
27 Y, Wang C, Shi L, Li H, Tong H Qu . Visualizing research impact through citation data. ACM Transactions on Interactive Intelligent Systems, 2018, 8( 1): 5
28 Z, Guo J, Tao S, Chen N, Chawla C Wang . SD2: slicing and dicing scholarly data for interactive evaluation of academic performance. IEEE Transactions on Visualization and Computer Graphics, 2022,
https://doi.org/10.1109/TVCG.2022.3163727
29 C Chen . Searching for intellectual turning points: progressive knowledge domain visualization. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101( S1): 5303–5310
30 Eck N J, Van L Waltman . CitNetExplorer: a new software tool for analyzing and visualizing citation networks. Journal of Informetrics, 2014, 8( 4): 802–823
31 Z, Lin N, Cao H, Tong F, Wang U, Kang D H P Chau . Demonstrating interactive multi-resolution large graph exploration. In: Proceedings of the 13th IEEE International Conference on Data Mining Workshops. 2013, 1097−1100
32 D, Ren B, Lee T Höllerer . Stardust: accessible and transparent GPU support for information visualization rendering. Computer Graphics Forum, 2017, 36( 3): 179–188
33 W, Tao X, Liu Y, Wang L, Battle Ç, Demiralp R, Chang M Stonebraker . Kyrix: interactive pan/zoom visualizations at scale. Computer Graphics Forum, 2019, 38( 3): 529–540
34 Y, Wang Z, Bai Z, Lin X, Dong Y, Feng J, Pan W Chen . G6: a web-based library for graph visualization. Visual Informatics, 2021, 5( 4): 49–55
35 D, Han J, Pan X, Zhao W Chen . NetV.js: a web-based library for high-efficiency visualization of large-scale graphs and networks. Visual Informatics, 2021, 5( 1): 61–66
36 V D, Blondel J L, Guillaume R, Lambiotte E Lefebvre . Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008: P10008
37 M, Bastian S, Heymann M Jacomy . Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Web and Social Media. 2009, 361−362
38 J, Leskovec K J, Lang A, Dasgupta M W Mahoney . Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009, 6( 1): 29–123
39 B, Rozemberczki R Sarkar . Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020, 1325−1334
40 B, Rozemberczki C, Allen R Sarkar . Multi-scale attributed node embedding. Journal of Complex Networks, 2021, 9( 2): cnab014
41 J, Yang J Leskovec . Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015, 42( 1): 181–213
42 J F, Kruiger P E, Rauber R M, Martins A, Kerren S, Kobourov A C Telea . Graph layouts by t-SNE. Computer Graphics Forum, 2017, 36( 3): 283–294
43 S, Hachul M Jünger . Drawing large graphs with a potential-field-based multilevel algorithm. In: Proceedings of the 12th International Symposium on Graph Drawing. 2004, 285−295
44 U, Brandes C Pich . Eigensolver methods for progressive multidimensional scaling of large data. In: Proceedings of the 14th International Symposium on Graph Drawing. 2006, 42−53
45 P, Gajer S G Kobourov . GRIP: graph drawing with intelligent placement. In: Proceedings of the 8th International Symposium on Graph Drawing. 2000, 222−228
46 T, Kamada S Kawai . An algorithm for drawing general undirected graphs. Information Processing Letters, 1989, 31( 1): 7–15
47 U K Laemmli . Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 1970, 227( 5259): 680–685
[1] FCS-22078-OF-QL_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed