Research on performance optimization of virtual data space across WAN

doi:10.1007/s11704-023-3087-8

Front. Comput. Sci.

2024, Vol. 18

Issue (6) : 186505 https://doi.org/10.1007/s11704-023-3087-8

RESEARCH ARTICLE

Research on performance optimization of virtual data space across WAN

Jiantong HUO^1,^2,³, Zhisheng HUO^1,^2,³(

), Limin XIAO^1,², Zhenxue HE⁴

¹. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
². School of Computer Science and Engineering, Beihang University, Beijing 100191, China
³. High Performance Computing Center, Beihang University, Beijing 100191, China
⁴. Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China

Download: PDF(15100 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

For the high-performance computing in a WAN environment, the geographical locations of national supercomputing centers are scattered and the network topology is complex, so it is difficult to form a unified view of resources. To aggregate the widely dispersed storage resources of national supercomputing centers in China, we have previously proposed a global virtual data space named GVDS in the project of “High Performance Computing Virtual Data Space”, a part of the National Key Research and Development Program of China. The GVDS enables large-scale applications of the high-performance computing to run efficiently across WAN. However, the applications running on the GVDS are often data-intensive, requiring large amounts of data from multiple supercomputing centers across WANs. In this regard, the GVDS suffers from performance bottlenecks in data migration and access across WANs. To solve the above-mentioned problem, this paper proposes a performance optimization framework of GVDS including the multitask-oriented data migration method and the request access-aware IO proxy resource allocation strategy. In a WAN environment, the framework proposed in this paper can make an efficient migration decision based on the amount of migrated data and the number of multiple data sources, guaranteeing lower average migration latency when multiple data migration tasks are running in parallel. In addition, it can ensure that the thread resource of the IO proxy node is fairly allocated among different types of requests (the IO proxy is a module of GVDS), so as to improve the application’s performance across WANs. The experimental results show that the framework can effectively reduce the average data access delay of GVDS while improving the performance of the application greatly.

Keywords storage aggregation across WANs large-scale applications GVDS data migration allocation of IO proxy resource

Corresponding Author(s): Zhisheng HUO

Just Accepted Date: 06 June 2023 Issue Date: 21 September 2023

Cite this article:

Jiantong HUO,Zhisheng HUO,Limin XIAO, et al. Research on performance optimization of virtual data space across WAN[J]. Front. Comput. Sci., 2024, 18(6): 186505.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3087-8
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I6/186505

Fig.1 Performance optimization framework

Fig.2 Architecture of MODM

Parameter	Meaning
$D C i$	Any supercomputing center $i$
$M$	Migration task set
$M r e$	Timed task set that completed before the deadline
$M b e s t$	Opportunistic task set for the copy data layout
$m$	Migration task
$M B → D C i$	Amount of data migrated to $D C i$
$M B_m i n t i m e → D C i$	Minimum delay of $M B → D C i$ migrated to $D C i$
$M B_m i n B W → D C i$	Minimum bandwidth of $M B → D C i$ migrated to $D C i$

Tab.1 Parameters of migration tasks

Parameter	Meaning
$D C i$	Any supercomputing center $i$
$I O n o d e j$	Any IO proxy node $j$
$B W (I O n o d e j)$	Maximum bandwidth of IO proxy node $j$
$F B W (I O n o d e j)$	Maximum foreground bandwidth of IO proxy node $j$
$B B W (I O n o d e j)$	Maximum background bandwidth of IO proxy node $j$
$f b w (I O n o d e j)$	Used foreground bandwidth of IO proxy node $j$
$b b w (I O n o d e j)$	Used background bandwidth of IO proxy node $j$
$B W D C i ? D C j$	Maximum bandwidth between $D C i$ and $D C j$
$F B W D C i ? D C j$	Maximum foreground bandwidth between $D C i$ and $D C j$
$B B W D C i ? D C j$	Maximum background bandwidth between $D C i$ and $D C j$
$f b w D C i ? D C j$	Used foreground bandwidth between $D C i$ and $D C j$
$b b w D C i ? D C j$	Used background bandwidth between $D C i$ and $D C j$

Tab.2 Parameters of supercomputing center bandwidth

Fig.3 Allocation of IBAM

Parameter	Meaning
$T$	Tth bandwidth allocation cycle
$l$	Maximum bandwidth limit for each migration task
$r$	Minimum bandwidth requirement for each migration task
$p$	Amount of bandwidth allocated to migration task
$L$	L label for sorting
$R$	R label for sorting
$P$	P label for sorting
$s y n c U n i t$	Largest data transmission unit when performing migration
$t$	Executing time of bandwidth allocation
$M B_l e f t$	Amount of data remaining to be migrated in the migration task

Tab.3 Parameters of supercomputing center bandwidth

Fig.4 Implementation of MODM in the GVDS

Fig.5 Process of MODM

Fig.6 Design of RASS

Parameter	Meaning
$I O n o d e$	IO proxy node
$S$	Data space, which is the basic data storage unit in the GVDS
$F$	File in the GVDS, which is the basic unit of user operations
$B$	Data block of file, which is logical organization of file
$I O P S (S)$	IOPS to the data space in the cycle
$M$	Mapping relationship between data space and IO proxy node

Tab.4 Parameters of IO proxy node’s workload model

Fig.7 Mapping between data space and IO proxy node of AWPA

Fig.8 Mapping between data space and IO proxy node of RTDO

Parameter	Meaning
$R$	IO request set processed by the IO proxy node
$R i$	IO request set with priority $i$
$n u m$	Number of IO requests
$T$	Tth cycle of thread resource allocation
$l a t l i m t$	Maximum queuing delay that a IO request can accept
$l a t a v g$	Average queuing delay of the IO request calculated by sampling
$d i f f$	Difference between maximum queuing delay and average queuing delay
$T d$	Number of threads in the thread pool of IO proxy node
$T d l i m t$	Minimum number of threads allocated to each type of request
$T d p r o p$	Extra number of threads allocated to each type of request

Tab.5 Parameters of RTDO

Fig.9 Implementation of AWPA and RTDO in the GVDS

Fig.10 Process of RASS

Tab.6 Hardware and software configurations

Fig.11 Network topology of MODM for performance

Fig.12 Comparison of data migration method. (a) Total running time of all migration tasks; (b) bandwidth of all migration tasks

Fig.13 Network topology of MODM for fairness

Fig.14 Comparison of multiple migration tasks’ bandwidth allocation

Fig.15 Network topology of AWPA

Fig.16 Comparison of read and write access bandwidth. (a) Bandwidth comparison of read; (b) bandwidth comparison of write

Fig.17 Network topology of RTDO

Fig.18 IOPS comparison of create and remove. (a) IOPS comparison of create; (b) IOPS comparison of remove

Fig.19 IOPS comparison of stat

Fig.20 Experimental deployment of GVDS

Tab.7 Experimental configuration

Supercomputing center	BUAA/(MB·s^?1)	NSCC-JN/(MB·s^?1)	SSC/(MB·s^?1)	NSCC-GZ/(MB·s^?1)	NSCC-CS/(MB·s^?1)	CAS/(MB·s^?1)
BUAA	?	31.2 $±$ 26.5	17.6 $±$ 15.0	10.7 $±$ 0.6	7.4 $±$ 3.0	102.7 $±$ 14.6
NSCC-JN	31.2 $±$ 26.5	?	33.5 $±$ 20.0	3.9 $±$ 2.5	3.3 $±$ 6.9	102.2 $±$ 21.7
SSC	17.6 $±$ 15.0	33.5 $±$ 20.0	?	8.9 $±$ 2.9	24.9 $±$ 7.5	67.8 $±$ 19.4
NSCC-GZ	10.7 $±$ 0.6	3.9 $±$ 2.5	8.9 $±$ 2.9	?	3.8 $±$ 2.7	9.6 $±$ 0.8
NSCC-CS	7.4 $±$ 3.0	3.3 $±$ 6.9	24.9 $±$ 7.5	3.8 $±$ 2.7	?	5.9 $±$ 0.9
CAS	102.7 $±$ 14.6	102.2 $±$ 21.7	67.8 $±$ 19.4	9.6 $±$ 0.8	5.9 $±$ 0.9	?

Tab.8 Network bandwidth between any two supercomputing centers

Fig.21 Network bandwidth status between supercomputing centers

Fig.22 Comparison of Reid between the original GVDS and the extended GVDS. (a) Comparison of subtask running time; (b) comparison of total running time

Tab.9 Bandwidth utilization for sequential read

Tab.10 Bandwidth utilization for sequential write

1	L M, Xiao Y, Song G J, Qin H J, Zhou C B, Wang B, Wei W, Wei Z S Huo . GVDS: a global virtual data space for wide-area high-performance computing environments. Big Data Research, 2021, 7( 2): 123–146
2	O, Tatebe K, Hiraga N Soda . Gfarm grid file system. New Generation Computing, 2010, 28( 3): 257–275
3	A, Thomson D J Abadi . CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies. 2015, 1−14
4	M, Wrzeszcz K, Trzepla R, Słota K, Zemek T, Lichoń Ł, Opioła D, Nikolow Ł, Dutka R, Słota J Kitowski . Metadata organization and management for globalization of data access with onedata. In: Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics. 2015, 312−321
5	ZENG, Rong HOU, Xiaofeng ZHANG, Lu LI, Chao ZHENG, Wenli GUO Minyi . Performance optimization for cloud computing systems in the microservice era: state-of-the-art and research opportunities. Frontiers of Computer Science, 2022, 16( 6): 166106
6	X, Ji B, Yang T, Zhang X, Ma X, Zhu X, Wang N, El-Sayed J, Zhai W, Liu W Xue . Automatic, Application-Aware I/O forwarding resource allocation. In: Proceedings of the 17th USENIX Conference on File and Storage Technologies. 2019, 265−279
7	Y, Song L, Xiao L, Wang G, Qin B, Wei B, Yan C Zhang . GCSS: a global collaborative scheduling strategy for wide-area high-performance computing. Frontiers of Computer Science, 2022, 16( 5): 165105
8	J, Huo Y, Xu Z, Huo L, Xiao Z He . Research on key technologies of edge cache in virtual data space across WAN. Frontiers of Computer Science, 2023, 17( 1): 171102
9	I, Gog M, Schwarzkopf A, Gleave R N M, Watson S Hand . Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. 2016, 99−115
10	A V Goldberg . An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 1997, 22( 1): 1–29
11	KE, Changbo XIAO, Fu HUANG, Zhiqiu XIAO Fangxiong . A user requirements-oriented privacy policy self-adaption scheme in cloud computing. Frontiers of Computer Science, 2023, 17( 2): 172203
12	E, Boutin J, Ekanayake W, Lin B, Shi J, Zhou Z, Qian M, Wu L Zhou . Apollo: scalable and coordinated scheduling for Cloud-Scale computing. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 2014, 285−300
13	Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the 6th ACM Symposium on Cloud Computing. 2015, 97−110
14	A W, Richa M, Mitzenmacher R Sitaraman . The power of two random choices: a survey of techniques and results. Combinatorial Optimization, 2001, 9: 255–304
15	J, Dean S Ghemawat . MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51( 1): 107–113
16	M, Isard M, Budiu Y, Yu A, Birrell D Fetterly . Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. 2007, 59−72
17	HOU, Zhengxiong SHEN, Hong ZHOU, Xingshe GU,Yunlan WANG, Jianhua ZHAO Tianhai . Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions. Frontiers of Computer Science, 2022, 16( 5): 165107
18	XUE, Shuai ZHAO, Shang CHEN, Quan SONG, Zhuo CHEN, Shanpei MA, Tao YANG, Yong ZHENG, Wenli GUO Minyi . Kronos: towards bus contention-aware job scheduling in warehouse scale computers. Frontiers of Computer Science, 2023, 17( 1): 171101
19	M, Schwarzkopf A, Konwinski M, Abd-El-Malek J Wilkes . Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems. 2013, 351−364
20	C Carrión . Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys, 2023, 55( 7): 138
21	G Park . A generalization of multiple choice balls-into-bins. In: Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. 2011, 297−298
22	Chang H S, Givan R, Chong E K P. On-line scheduling via sampling. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems. 2000, 62−71
23	Dong X, Wang Y, Liao H. Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 9−16
24	Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 69–84
25	C, Delimitrou C Kozyrakis . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48( 4): 77–88
26	C, Delimitrou C Kozyrakis . Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49( 4): 127–144
27	Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems. 2015, 18
28	Tumanov A, Zhu T, Park J W, Kozuch M A, Harchol-Balter M, Ganger G R. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In: Proceedings of the 11th European Conference on Computer Systems. 2016, 35
29	W, Khallouli J Huang . Cluster resource scheduling in cloud computing: literature review and research challenges. The Journal of Supercomputing, 2022, 78( 5): 6898–6943
30	C, Curino D E, Difallah C, Douglas S, Krishnan R, Ramakrishnan S Rao . Reservation-based scheduling: if you’re late don’t blame us! In: Proceedings of the ACM Symposium on Cloud Computing. 2014, 1−14
31	Wang Z, Zhang G, Wang Y, Yang Q, Zhu J. Dayu: fast and low-interference data recovery in very-large storage systems. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 993−1007
32	Ongaro D, Rumble S M, Stutsman R, Ousterhout J, Rosenblum M. Fast crash recovery in RAMCloud. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011, 29−41
33	F, Chang J, Dean S, Ghemawat W C, Hsieh D A, Wallach M, Burrows T, Chandra A, Fikes R E Gruber . Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems, 2008, 26( 2): 4
34	M, Chowdhury M, Zaharia J, Ma M I, Jordan I Stoica . Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review, 2011, 41( 4): 98–109
35	X, He B, Yang J, Gao W, Xiao Q, Chen S, Shi D, Chen W, Liu W, Xue Z Chen . HadaFS: a file system bridging the local and shared burst buffer for exascale supercomputers. In: Proceedings of the 21st USENIX Conference on File and Storage Technologies. 2023, 215−230
36	Y, Diao J L, Hellerstein S, Parekh H, Shaikh M, Surendra A Tantawi . Modeling differentiated services of multi-tier web applications. In: Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. 2006, 314−326
37	C, Lu Y, Lu T F, Abdelzaher J A, Stankovic S H Son . Feedback control architecture and design methodology for service delay guarantees in Web servers. IEEE Transactions on Parallel and Distributed Systems, 2006, 17( 9): 1014–1027
38	Zhang Y, Jiang J, Xu K, Nie X, Reed M J, Wang H, Yao G, Zhang M, Chen K. BDS: a centralized near-optimal overlay network for inter-datacenter data replication. In: Proceedings of the 13th EuroSys Conference. 2018, 10
39	Park J W, Tumanov A, Jiang A, Kozuch M A, Ganger G R. 3Sigma: distribution-based cluster scheduling for runtime uncertainty. In: Proceedings of the 13th EuroSys Conference. 2018, 2
40	L, Zheng Y, Yang A G Hauptmann . Person re-identification: past, present and future. 2016, arXiv preprint arXiv: 1610.02984
41	A, Gulati A, Merchant P J Varman . mClock: handling throughput variability for hypervisor IO scheduling. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 437−450
42	J, Li Y, Xia B, Li Z Zeng . A pseudo-dynamic search ant colony optimization algorithm with improved negative feedback mechanism. Cognitive Systems Research, 2020, 62: 1–9
43	Ahmad E S. Infrastructure as a service: a practical study of alibaba cloud elastic compute service (ECS)[J]. Tartous University-A Project, 2019.
44	GB/T 7714Axboe J. Fio-flexible i/o tester synthetic benchmark. URL, See github. com/axboe/fio website (Accessed: 2015-06-13), 2005
45	N, Mittal K, Garg A Ameria . A paper on modified round robin algorithm. International Journal of Latest Technology in Engineering, Management & Applied Science, 2015, 4( 11): 93–98
46	Mdtest hpc benchmark, available from the website of mdtest.sourceforge.net/

[1]

FCS-23087-OF-JH_suppl_1

Download

[1]	Ye CHI, Jianhui YUE, Xiaofei LIAO, Haikun LIU, Hai JIN. A hybrid memory architecture supporting fine-grained data migration[J]. Front. Comput. Sci., 2024, 18(2): 182103-.
[2]	ZENG Lingfang, FENG Dan, JIANG Hong. High TPO/TCO for data storage: policy, algorithm and early practice[J]. Front. Comput. Sci., 2007, 1(3): 349-360.

Viewed

Full text

Abstract

Cited

Shared

Discussed