Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (1) : 171102    https://doi.org/10.1007/s11704-022-1176-8
RESEARCH ARTICLE
Research on key technologies of edge cache in virtual data space across WAN
Jiantong HUO1,2,3, Yaowen XU6, Zhisheng HUO1,2,3,4(), Limin XIAO1,2, Zhenxue HE5
1. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
3. High Performance Computing Center, Beihang University, Beijing 100191, China
4. College of Software, Beihang University, Beijing 100191, China
5. Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China
6. College of Computer Science and Technology, Zhejiang University, HangZhou 310013, China
 Download: PDF(12513 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).

Keywords virtual data space system      wide area network      edge cache      redundant data transmission     
Corresponding Author(s): Zhisheng HUO   
Just Accepted Date: 29 October 2021   Issue Date: 01 March 2022
 Cite this article:   
Jiantong HUO,Yaowen XU,Zhisheng HUO, et al. Research on key technologies of edge cache in virtual data space across WAN[J]. Front. Comput. Sci., 2023, 17(1): 171102.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1176-8
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I1/171102
Fig.1  Deployment of GVDS
Fig.2  Weather data processing flow
Fig.3  Implementation of edge cache in GVDS (IAP, CAS: Institute of Atmospheric Sciences, Chinese Academy of Sciences; SIMM, CAS: Chinese Academy of Sciences Institute of Medicine; IHEP, CAS: Institute of High Energy, Chinese Academy of Sciences)
Fig.4  Edge caching workflow
Fig.5  Mapping of file global path to cache file metadata in the first layer
Fig.6  Mapping of file fingerprint to file UUID in the second layer
Fig.7  Edge cache file replacement process
Fig.8  Edge cache cluster architecture
Fig.9  Consistent hash based on virtual node
Type of lock S SX X
S compatible compatible conflict
SX compatible conflict conflict
X conflict conflict conflict
Tab.1  Lock compatibility
Fig.10  EA accesses the edge cache in the presence of the lock mechanism
Fig.11  Metadata locking
Fig.12  File lock conflict
Fig.13  Locking process when maintaining file data
Fig.14  Edge cache and GVDS synchronization
Interface Specification Parameter description
Metadata query interface of edge cache system CacheMeta EdgeCacheSearch(string global _path, string ioproxy_info, string relative _path, string token) global _path: File global path; ioproxy_info: the IO agent information that manages the file; relative _path: the relative path of the IO agent; token: customer certificate; return: cached metadata information or failed
File writing interface of edge cache system Int64 EdgeCacheWrite (string global _path, string ioproxy_info, string relative _path, byte[] buf, int64 offset, int64 length) global _path: file global path; ioproxy_info: the IO agent information that manages the file; buf: file buffer; offset: start position of file writing; length: the length of the data written in the file; return: the amount of data written, 0(write successful), ?1 (write failed)
File reading interface of edge cache system Int64 EdgeCacheRead (string global _path, string ioproxy_info, string relative _path, byte[] buf, int64 offset, int64 length) global _path: file global path; ioproxy_info: the IO agent information that manages the file; relative _path: the relative path of the IO agent; buf: file buffer; offset: start position of file reading; length: the length of the data read in the file; return: the amount of data read (read successful), ?1 (read failed)
File closing interface of edge cache system Int64 EdgeCacheClose (string global _path, string ioproxy_info, string relative _path) global _path: file global path; ioproxy_info: the IO agent information that manages the file; relative _path: the relative path of the IO agent; return: 0 (successful shutdown), ?1 (failed shutdown)
Tab.2  Cached data access interface
Interface Specification Return value Interface Description
Cache file interface for adding POST /cache; Json Value; /r/n Value 200 OK; 500 Internal Server Error The management interface adds files corresponding to the specified global path to the edge cache system.
Cache file query interface POST /cache; Json Value; /r/n Value 200 OK; 500 Internal Server Error The management interface queries the file corresponding to the specified global path in the edge cache system
Cache file update interface POST /cache; Json Value; /r/n Value 200 OK; 500 Internal Server Error The management interface synchronizes the file corresponding to the specified global path in the edge cache system
Cache file delete interface POST /cache; Json Value; /r/n Value 200 OK; 500 Internal Server Error The management interface deletes the file corresponding to the specified global path in the edge cache system
Edge cache cluster status query interface GET /cluster Value 200 OK; 500 Internal Server Error The management interface obtains the cluster status information of the edge cache service node
Edge cache storage capacity status query interface GET /storage Value 200 OK; 500 Internal Server Error The management interface queries the storage capacity status of the edge cache system
Tab.3  Restful management interface of edge cache system
Configuration Description
CPU Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz; 1 Physical Processor, 8 Cores, 8 Threads
Memory DDR4, 3200MHz, 8GB x 4 (32GB)
SSD of system SAMSUNG 250GB M.2 NVMe 970 EVO Plus
Operation system Ubuntu 18.04.3 LTS Linux 5.4.0 x86_64
Motherboard MSI MAG Z390 TOMAHAWK
Network card Intel(R) Ethernet Connection I219-V
Tab.4  PC configuration for functional evaluation
Brand model Description Number of switches
HUAWEI S1700-24GR 24-port full gigabit enterprise ethernet switch 2
Tab.5  LAN switch
Configuration Description
CPU Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz; 1 Physical Processor, 8 Cores, 8 Threads
Memory DDR4, 3200MHz, 8GB x 4 (32GB)
SSD of system Toshiba 256GB M.2 NVMe 2230 SSD
SSD of data SanDisk 960GB SATA3.0 SSD
Operation system Microsoft Windows 10 Professional 10.0.19041 x86_64
Motherboard DELL OptiPlex 7070 0NRKPK A03
Network card Intel(R) Ethernet Connection I219-LM
Tab.6  Client configuration of edge cache system in LAN
Brand model Description Number of servers
DELL R740 CPU Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 2 Physical Processor, 24 Cores, 48 Threads 3
Memory 128GB
Storage 1TB SSD + 86TB HDD
Operation system Ubuntu 18.04.5 LTS Linux 4.15.0 x86_64
Network card BCM5720-2P - 2 x 1GbE PCIe NIC
Tab.7  Server configuration in LAN
Brand model Description Number of servers
DELL R740 CPU Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz 2 Physical Processor, 28 Cores, 56 Threads 2
Memory 64GB
Storage 44 TB HDD
Operation system Ubuntu 18.04.5 LTS Linux 4.15.0 x86_64
Tab.8  Server configuration of supercomputing center
Software Version
GCC 7.5.0
libfuse 3.8.0
Python 3.6.9
C++ C++ 11
gRPC, Protobuf 1.27.2, 3.13.0
Go 1.14.7
VirtualBox 6.1.16
WANem WANem 3.0 Beta
fio 3.1
iperf 3.1.3
Tab.9  Software tools
Fig.15  Virtual machine network topology
Host role CPU Memory Hard disk Operation system
WAN simulator 1 cpu*1 core 2GB 8GB Debian 6.0.2 i686
Edge client, Edge service node 2 cpu*2 core 4GB 20GB Ubuntu 18.04.2 x86_64
GVDS 8 cpu*8 core 8GB 20GB Ubuntu 18.04.5 x86_64
Tab.10  Hardware configuration of virtual machine
Host name IP address Subnet mask Default gateway
Wan simulator 10.0.2.100 255.255.255.0 10.0.2.1
10.1.3.100
edge clients 1,2 10.0.2.101-102 255.255.255.0 10.0.2.1
edge servers 1, 2, 3 10.0.2.103-105 255.255.255.0 10.0.2.1
GVDS 10.1.3.106 255.255.255.0 10.1.3.1
Tab.11  Network configuration of virtual host
Fig.16  File query command running result
Fig.17  File addition command running result
Fig.18  File synchronization command running result
Fig.19  File deletion command running result
Fig.20  State of the underlying file system Lustre
Fig.21  State of the client mounting
Fig.22  Evaluation of request routing method based on consistent hash algorithm
Fig.23  Client operation information
Fig.24  Log information of edge service node 1
Fig.25  Log information of edge service node 2
Fig.26  Metadata and directory operations of client
Fig.27  File operations of client
Fig.28  Node topology in WAN
Host IP address Subnet mask Default gateway
Edge client 10.134.150.58 255.255.255.0 10.134.150.1
Edge server 10.134.150.79 255.255.255.0 10.134.150.1
GVDS 211.137.201.247 255.255.255.0 192.168.10.1
Tab.12  IP configuration of server in WAN
Network environment Upstream bandwidth Downstream bandwidth/MBs?1 latency/ms jitter/ms Rate of Packet loss/%
Loopback network 7.8 GB/s 7.9 0.04 0.01 0
Local area network 117.8 MB/s 117.8 0.12 0.02 0
Wide area network 4.6 MB/s 7.3 21.22 2.29 1
Tab.13  Network performance
Fig.29  Metadata access performance comparison between GVDS and GeCache in the case of multi-process concurrency. (a) File creation; (b) File query; (c) File deletion
Fig.30  File access performance comparison between GVDS and GeCache. (a) Sequential write performance changes; (b) Sequential read performance changes; (c) Random write performance changes; (d) Random read performance changes
Fig.31  File access performance comparison between GVDS and GeCache in the case of multi-process concurrency. (a) Multi-process single-file write performance changes; (b) Multi-process single file read performance changes; (c) Multi-process multi-file write performance changes; (d) Multi-process multi-file read performance changes
Fig.32  File access performance comparison between NFS and GeCache in the case of multi-process concurrency. (a) Multi-process single-file write performance changes; (b) Multi-process single file read performance changes; (c) Multi-process multi-file write performance changes; (d) Multi-process multi-file read performance changes
  
  
  
  
  
1 K Ashton . That ‘internet of things’ thing. RFID Journal, 2009, 22( 7): 97– 114
2 L M Liu , B Wang . Research of an end-to-end transfer mechanism for big data in CMAGrid environment. Computing Technology and Automation, 2014, 33( 1): 122– 126
3 B Wang , X Zong , H Tian . Design and establishment of a nationwide meteorological computational grid. Journal of Applied Meteorological Science, 2010, 21( 5): 632– 640
4 S Li , L D Xu , S Zhao . The internet of things: a survey. Information Systems Frontiers, 2015, 17( 2): 243– 259
5 J Dilley , B Maggs , J Parikh , H Prokop , R Sitaraman , B Weihl . Globally distributed content delivery. IEEE Internet Computing, 2002, 6( 5): 50– 58
6 M Satyanarayanan . The emergence of edge computing. Computer, 2017, 50( 1): 30– 39
7 Z Su , M Dai , Q Xu , R Li , S Fu . Q-learning-based spectrum access for content delivery in mobile networks. IEEE Transactions on Cognitive Communications and Networking, 2020, 6( 1): 35– 47
8 L Ramaswamy , L Liu , A Iyengar . Scalable delivery of dynamic content using a cooperative edge cache grid. IEEE Transactions on Knowledge and Data Engineering, 2007, 19( 5): 614– 630
9 J Zhao. The case for VM-based cloudlets in mobile computing. See Docin.Com/P-1950150101 website, 2010
10 S Wilkinson, T Boshevski, J Brandoff, V Buterin. Storj a peer-to-peer cloud storage network. See , 2014
11 B Chen , C Yang , G Wang . High-throughput opportunistic cooperative device-to-device communications with caching. IEEE Transactions on Vehicular Technology, 2017, 66( 8): 7527– 7539
12 H Tan, H C Jiang, Z Han, L Liu, Q Zhao. Camul: online caching on multiple caches with relaying and bypassing. In: Proceedings of 2019 IEEE Conference on Computer Communications. 2019
13 A Headquarters. Cisco wide area application services configuration guide. See
14 B Berg, D S Berger, S McAllister, I Grosof, S Gunasekar, J Lu, M Uhlar, J Carrig, N Beckmann, M Harchol-Balter, G R Ganger. The Cachelib caching engine: design and experiences at scale. In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. 2020, 753– 768
15 L Cherkasova. Improving WWW proxies performance with greedy-dual-size-frequency caching policy. See hpl.hp.com/techreports/98/HPL-98–69R1 website, 1998
16 Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing. 1997, 654−663
17 A Das, I Gupta, A Motivala. SWIM: scalable weakly-consistent infection-style process group membership protocol. In: Proceedings International Conference on Dependable Systems and Networks. 2002, 303−312
18 K Birman . The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review, 2007, 41( 5): 8– 13
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed