Research on key technologies of edge cache in virtual data space across WAN

doi:10.1007/s11704-022-1176-8

Front. Comput. Sci.

2023, Vol. 17

Issue (1) : 171102 https://doi.org/10.1007/s11704-022-1176-8

RESEARCH ARTICLE

Research on key technologies of edge cache in virtual data space across WAN

Jiantong HUO^1,^2,³, Yaowen XU⁶, Zhisheng HUO^1,^2,^3,⁴(

), Limin XIAO^1,², Zhenxue HE⁵

¹. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
². School of Computer Science and Engineering, Beihang University, Beijing 100191, China
³. High Performance Computing Center, Beihang University, Beijing 100191, China
⁴. College of Software, Beihang University, Beijing 100191, China
⁵. Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China
⁶. College of Computer Science and Technology, Zhejiang University, HangZhou 310013, China

Download: PDF(12513 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).

Keywords virtual data space system wide area network edge cache redundant data transmission

Corresponding Author(s): Zhisheng HUO

Just Accepted Date: 29 October 2021 Issue Date: 01 March 2022

Cite this article:

Jiantong HUO,Yaowen XU,Zhisheng HUO, et al. Research on key technologies of edge cache in virtual data space across WAN[J]. Front. Comput. Sci., 2023, 17(1): 171102.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1176-8
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I1/171102

Fig.1 Deployment of GVDS

Fig.2 Weather data processing flow

Fig.3 Implementation of edge cache in GVDS (IAP, CAS: Institute of Atmospheric Sciences, Chinese Academy of Sciences; SIMM, CAS: Chinese Academy of Sciences Institute of Medicine; IHEP, CAS: Institute of High Energy, Chinese Academy of Sciences)

Fig.4 Edge caching workflow

Fig.5 Mapping of file global path to cache file metadata in the first layer

Fig.6 Mapping of file fingerprint to file UUID in the second layer

Fig.7 Edge cache file replacement process

Fig.8 Edge cache cluster architecture

Fig.9 Consistent hash based on virtual node

Tab.1 Lock compatibility

Fig.10 EA accesses the edge cache in the presence of the lock mechanism

Fig.11 Metadata locking

Fig.12 File lock conflict

Fig.13 Locking process when maintaining file data

Fig.14 Edge cache and GVDS synchronization

Interface	Specification	Parameter description
Metadata query interface of edge cache system	CacheMeta EdgeCacheSearch(string global $_$ path, string ioproxy_info, string relative $_$ path, string token)	global $_$ path: File global path; ioproxy_info: the IO agent information that manages the file; relative $_$ path: the relative path of the IO agent; token: customer certificate; return: cached metadata information or failed
File writing interface of edge cache system	Int64 EdgeCacheWrite (string global $_$ path, string ioproxy_info, string relative $_$ path, byte[] buf, int64 offset, int64 length)	global $_$ path: file global path; ioproxy_info: the IO agent information that manages the file; buf: file buffer; offset: start position of file writing; length: the length of the data written in the file; return: the amount of data written, 0(write successful), ?1 (write failed)
File reading interface of edge cache system	Int64 EdgeCacheRead (string global $_$ path, string ioproxy_info, string relative $_$ path, byte[] buf, int64 offset, int64 length)	global $_$ path: file global path; ioproxy_info: the IO agent information that manages the file; relative $_$ path: the relative path of the IO agent; buf: file buffer; offset: start position of file reading; length: the length of the data read in the file; return: the amount of data read (read successful), ?1 (read failed)
File closing interface of edge cache system	Int64 EdgeCacheClose (string global $_$ path, string ioproxy_info, string relative $_$ path)	global $_$ path: file global path; ioproxy_info: the IO agent information that manages the file; relative $_$ path: the relative path of the IO agent; return: 0 (successful shutdown), ?1 (failed shutdown)

Tab.2 Cached data access interface

Tab.3 Restful management interface of edge cache system

Tab.4 PC configuration for functional evaluation

Tab.5 LAN switch

Tab.6 Client configuration of edge cache system in LAN

Tab.7 Server configuration in LAN

Tab.8 Server configuration of supercomputing center

Tab.9 Software tools

Fig.15 Virtual machine network topology

Tab.10 Hardware configuration of virtual machine

Tab.11 Network configuration of virtual host

Fig.16 File query command running result

Fig.17 File addition command running result

Fig.18 File synchronization command running result

Fig.19 File deletion command running result

Fig.20 State of the underlying file system Lustre

Fig.21 State of the client mounting

Fig.22 Evaluation of request routing method based on consistent hash algorithm

Fig.23 Client operation information

Fig.24 Log information of edge service node 1

Fig.25 Log information of edge service node 2

Fig.26 Metadata and directory operations of client

Fig.27 File operations of client

Fig.28 Node topology in WAN

Tab.12 IP configuration of server in WAN

Tab.13 Network performance

Fig.29 Metadata access performance comparison between GVDS and GeCache in the case of multi-process concurrency. (a) File creation; (b) File query; (c) File deletion

Fig.30 File access performance comparison between GVDS and GeCache. (a) Sequential write performance changes; (b) Sequential read performance changes; (c) Random write performance changes; (d) Random read performance changes

Fig.31 File access performance comparison between GVDS and GeCache in the case of multi-process concurrency. (a) Multi-process single-file write performance changes; (b) Multi-process single file read performance changes; (c) Multi-process multi-file write performance changes; (d) Multi-process multi-file read performance changes

Fig.32 File access performance comparison between NFS and GeCache in the case of multi-process concurrency. (a) Multi-process single-file write performance changes; (b) Multi-process single file read performance changes; (c) Multi-process multi-file write performance changes; (d) Multi-process multi-file read performance changes

1	K Ashton . That ‘internet of things’ thing. RFID Journal, 2009, 22( 7): 97– 114
2	L M Liu , B Wang . Research of an end-to-end transfer mechanism for big data in CMAGrid environment. Computing Technology and Automation, 2014, 33( 1): 122– 126
3	B Wang , X Zong , H Tian . Design and establishment of a nationwide meteorological computational grid. Journal of Applied Meteorological Science, 2010, 21( 5): 632– 640
4	S Li , L D Xu , S Zhao . The internet of things: a survey. Information Systems Frontiers, 2015, 17( 2): 243– 259
5	J Dilley , B Maggs , J Parikh , H Prokop , R Sitaraman , B Weihl . Globally distributed content delivery. IEEE Internet Computing, 2002, 6( 5): 50– 58
6	M Satyanarayanan . The emergence of edge computing. Computer, 2017, 50( 1): 30– 39
7	Z Su , M Dai , Q Xu , R Li , S Fu . Q-learning-based spectrum access for content delivery in mobile networks. IEEE Transactions on Cognitive Communications and Networking, 2020, 6( 1): 35– 47
8	L Ramaswamy , L Liu , A Iyengar . Scalable delivery of dynamic content using a cooperative edge cache grid. IEEE Transactions on Knowledge and Data Engineering, 2007, 19( 5): 614– 630
9	J Zhao. The case for VM-based cloudlets in mobile computing. See Docin.Com/P-1950150101 website, 2010
10	S Wilkinson, T Boshevski, J Brandoff, V Buterin. Storj a peer-to-peer cloud storage network. See , 2014
11	B Chen , C Yang , G Wang . High-throughput opportunistic cooperative device-to-device communications with caching. IEEE Transactions on Vehicular Technology, 2017, 66( 8): 7527– 7539
12	H Tan, H C Jiang, Z Han, L Liu, Q Zhao. Camul: online caching on multiple caches with relaying and bypassing. In: Proceedings of 2019 IEEE Conference on Computer Communications. 2019
13	A Headquarters. Cisco wide area application services configuration guide. See
14	B Berg, D S Berger, S McAllister, I Grosof, S Gunasekar, J Lu, M Uhlar, J Carrig, N Beckmann, M Harchol-Balter, G R Ganger. The Cachelib caching engine: design and experiences at scale. In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. 2020, 753– 768
15	L Cherkasova. Improving WWW proxies performance with greedy-dual-size-frequency caching policy. See hpl.hp.com/techreports/98/HPL-98–69R1 website, 1998
16	Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing. 1997, 654−663
17	A Das, I Gupta, A Motivala. SWIM: scalable weakly-consistent infection-style process group membership protocol. In: Proceedings International Conference on Dependable Systems and Networks. 2002, 303−312
18	K Birman . The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review, 2007, 41( 5): 8– 13

[1]

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed