1. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China 2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China 3. High Performance Computing Center, Beihang University, Beijing 100191, China 4. Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China
For the high-performance computing in a WAN environment, the geographical locations of national supercomputing centers are scattered and the network topology is complex, so it is difficult to form a unified view of resources. To aggregate the widely dispersed storage resources of national supercomputing centers in China, we have previously proposed a global virtual data space named GVDS in the project of “High Performance Computing Virtual Data Space”, a part of the National Key Research and Development Program of China. The GVDS enables large-scale applications of the high-performance computing to run efficiently across WAN. However, the applications running on the GVDS are often data-intensive, requiring large amounts of data from multiple supercomputing centers across WANs. In this regard, the GVDS suffers from performance bottlenecks in data migration and access across WANs. To solve the above-mentioned problem, this paper proposes a performance optimization framework of GVDS including the multitask-oriented data migration method and the request access-aware IO proxy resource allocation strategy. In a WAN environment, the framework proposed in this paper can make an efficient migration decision based on the amount of migrated data and the number of multiple data sources, guaranteeing lower average migration latency when multiple data migration tasks are running in parallel. In addition, it can ensure that the thread resource of the IO proxy node is fairly allocated among different types of requests (the IO proxy is a module of GVDS), so as to improve the application’s performance across WANs. The experimental results show that the framework can effectively reduce the average data access delay of GVDS while improving the performance of the application greatly.
. [J]. Frontiers of Computer Science, 2024, 18(6): 186505.
Jiantong HUO, Zhisheng HUO, Limin XIAO, Zhenxue HE. Research on performance optimization of virtual data space across WAN. Front. Comput. Sci., 2024, 18(6): 186505.
Minimum bandwidth requirement for each migration task
Amount of bandwidth allocated to migration task
L label for sorting
R label for sorting
P label for sorting
Largest data transmission unit when performing migration
Executing time of bandwidth allocation
Amount of data remaining to be migrated in the migration task
Tab.3
Fig.4
Fig.5
Fig.6
Parameter
Meaning
IO proxy node
Data space, which is the basic data storage unit in the GVDS
File in the GVDS, which is the basic unit of user operations
Data block of file, which is logical organization of file
IOPS to the data space in the cycle
Mapping relationship between data space and IO proxy node
Tab.4
Fig.7
Fig.8
Parameter
Meaning
IO request set processed by the IO proxy node
IO request set with priority
Number of IO requests
Tth cycle of thread resource allocation
Maximum queuing delay that a IO request can accept
Average queuing delay of the IO request calculated by sampling
Difference between maximum queuing delay and average queuing delay
Number of threads in the thread pool of IO proxy node
Minimum number of threads allocated to each type of request
Extra number of threads allocated to each type of request
Tab.5
Fig.9
Fig.10
Hardware and software
Configuration
Operating system
Ubuntu 18.04 64 bits (better support the GVDS)
CPU
ecs.g7.large (2 cores)
Memory
8 GB
HDD
40 GB
Kernel version
4.15
WAN bandwidth
100 Mbps
GVDS version
1.6.9 (GVDS including MODM and RAAS)
Tab.6
Fig.11
Fig.12
Fig.13
Fig.14
Fig.15
Fig.16
Fig.17
Fig.18
Fig.19
Fig.20
Supercomputing center
# of servers
Storage capacity/TB
# of cores
CAS
2
266.44
112 cores
SSC
4
1208.121
136 cores
NSCC-JN
4
87.4
156 cores
NSCC-GZ
4
1.45
192 cores
NCSS-CS
2
715.75
64 cores
Tab.7
Supercomputing center
BUAA/(MB·s?1)
NSCC-JN/(MB·s?1)
SSC/(MB·s?1)
NSCC-GZ/(MB·s?1)
NSCC-CS/(MB·s?1)
CAS/(MB·s?1)
BUAA
?
31.2 26.5
17.6 15.0
10.7 0.6
7.4 3.0
102.7 14.6
NSCC-JN
31.226.5
?
33.520.0
3.92.5
3.36.9
102.221.7
SSC
17.615.0
33.520.0
?
8.92.9
24.97.5
67.819.4
NSCC-GZ
10.70.6
3.92.5
8.92.9
?
3.82.7
9.60.8
NSCC-CS
7.43.0
3.36.9
24.97.5
3.82.7
?
5.90.9
CAS
102.714.6
102.221.7
67.819.4
9.60.8
5.90.9
?
Tab.8
Fig.21
Fig.22
Size of data block/KB
Sequential read bandwidth (NSCC CS to GZ: MB/s)
Sequential read bandwidth (NSCC JN to GZ: MB/s)
Bandwidth utilization/%
8
3.554
3.526
91.13
16
3.608
3.59
92.65
32
3.659
3.614
93.62
64
3.612
3.636
93.28
128
3.643
3.538
92.44
256
3.606
3.463
91.02
512
3.701
3.517
92.94
1024
3.729
3.587
94.19
2048
3.598
3.561
92.14
4096
3.698
3.613
94.12
8192
3.666
3.661
94.31
Average
3.643
3.573
92.90
Tab.9
Size of data block/KB
Sequential write bandwidth (NSCC CS to GZ: MB/s)
Sequential write bandwidth (NSCC JN to GZ: MB/s)
Bandwidth utilization/%
8
3.76
3.764
96.84
16
3.736
3.692
95.62
32
3.433
3.736
92.19
64
3.734
3.798
96.93
128
3.687
3.811
96.46
256
3.772
3.811
97.59
512
3.647
3.782
95.59
1024
3.793
3.805
97.78
2048
3.68
3.673
94.65
4096
3.646
3.683
94.32
8192
3.624
3.787
95.35
Average
3.76
3.764
95.76
Tab.10
1
L M, Xiao Y, Song G J, Qin H J, Zhou C B, Wang B, Wei W, Wei Z S Huo . GVDS: a global virtual data space for wide-area high-performance computing environments. Big Data Research, 2021, 7( 2): 123–146
2
O, Tatebe K, Hiraga N Soda . Gfarm grid file system. New Generation Computing, 2010, 28( 3): 257–275
3
A, Thomson D J Abadi . CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies. 2015, 1−14
4
M, Wrzeszcz K, Trzepla R, Słota K, Zemek T, Lichoń Ł, Opioła D, Nikolow Ł, Dutka R, Słota J Kitowski . Metadata organization and management for globalization of data access with onedata. In: Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics. 2015, 312−321
5
ZENG, Rong HOU, Xiaofeng ZHANG, Lu LI, Chao ZHENG, Wenli GUO Minyi . Performance optimization for cloud computing systems in the microservice era: state-of-the-art and research opportunities. Frontiers of Computer Science, 2022, 16( 6): 166106
6
X, Ji B, Yang T, Zhang X, Ma X, Zhu X, Wang N, El-Sayed J, Zhai W, Liu W Xue . Automatic, Application-Aware I/O forwarding resource allocation. In: Proceedings of the 17th USENIX Conference on File and Storage Technologies. 2019, 265−279
7
Y, Song L, Xiao L, Wang G, Qin B, Wei B, Yan C Zhang . GCSS: a global collaborative scheduling strategy for wide-area high-performance computing. Frontiers of Computer Science, 2022, 16( 5): 165105
8
J, Huo Y, Xu Z, Huo L, Xiao Z He . Research on key technologies of edge cache in virtual data space across WAN. Frontiers of Computer Science, 2023, 17( 1): 171102
9
I, Gog M, Schwarzkopf A, Gleave R N M, Watson S Hand . Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. 2016, 99−115
10
A V Goldberg . An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 1997, 22( 1): 1–29
11
KE, Changbo XIAO, Fu HUANG, Zhiqiu XIAO Fangxiong . A user requirements-oriented privacy policy self-adaption scheme in cloud computing. Frontiers of Computer Science, 2023, 17( 2): 172203
12
E, Boutin J, Ekanayake W, Lin B, Shi J, Zhou Z, Qian M, Wu L Zhou . Apollo: scalable and coordinated scheduling for Cloud-Scale computing. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 2014, 285−300
13
Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the 6th ACM Symposium on Cloud Computing. 2015, 97−110
14
A W, Richa M, Mitzenmacher R Sitaraman . The power of two random choices: a survey of techniques and results. Combinatorial Optimization, 2001, 9: 255–304
15
J, Dean S Ghemawat . MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51( 1): 107–113
16
M, Isard M, Budiu Y, Yu A, Birrell D Fetterly . Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. 2007, 59−72
17
HOU, Zhengxiong SHEN, Hong ZHOU, Xingshe GU,Yunlan WANG, Jianhua ZHAO Tianhai . Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions. Frontiers of Computer Science, 2022, 16( 5): 165107
18
XUE, Shuai ZHAO, Shang CHEN, Quan SONG, Zhuo CHEN, Shanpei MA, Tao YANG, Yong ZHENG, Wenli GUO Minyi . Kronos: towards bus contention-aware job scheduling in warehouse scale computers. Frontiers of Computer Science, 2023, 17( 1): 171101
19
M, Schwarzkopf A, Konwinski M, Abd-El-Malek J Wilkes . Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems. 2013, 351−364
20
C Carrión . Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys, 2023, 55( 7): 138
21
G Park . A generalization of multiple choice balls-into-bins. In: Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. 2011, 297−298
22
Chang H S, Givan R, Chong E K P. On-line scheduling via sampling. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems. 2000, 62−71
23
Dong X, Wang Y, Liao H. Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 9−16
24
Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 69–84
25
C, Delimitrou C Kozyrakis . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48( 4): 77–88
26
C, Delimitrou C Kozyrakis . Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49( 4): 127–144
27
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems. 2015, 18
28
Tumanov A, Zhu T, Park J W, Kozuch M A, Harchol-Balter M, Ganger G R. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In: Proceedings of the 11th European Conference on Computer Systems. 2016, 35
29
W, Khallouli J Huang . Cluster resource scheduling in cloud computing: literature review and research challenges. The Journal of Supercomputing, 2022, 78( 5): 6898–6943
30
C, Curino D E, Difallah C, Douglas S, Krishnan R, Ramakrishnan S Rao . Reservation-based scheduling: if you’re late don’t blame us! In: Proceedings of the ACM Symposium on Cloud Computing. 2014, 1−14
31
Wang Z, Zhang G, Wang Y, Yang Q, Zhu J. Dayu: fast and low-interference data recovery in very-large storage systems. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 993−1007
32
Ongaro D, Rumble S M, Stutsman R, Ousterhout J, Rosenblum M. Fast crash recovery in RAMCloud. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011, 29−41
33
F, Chang J, Dean S, Ghemawat W C, Hsieh D A, Wallach M, Burrows T, Chandra A, Fikes R E Gruber . Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems, 2008, 26( 2): 4
34
M, Chowdhury M, Zaharia J, Ma M I, Jordan I Stoica . Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review, 2011, 41( 4): 98–109
35
X, He B, Yang J, Gao W, Xiao Q, Chen S, Shi D, Chen W, Liu W, Xue Z Chen . HadaFS: a file system bridging the local and shared burst buffer for exascale supercomputers. In: Proceedings of the 21st USENIX Conference on File and Storage Technologies. 2023, 215−230
36
Y, Diao J L, Hellerstein S, Parekh H, Shaikh M, Surendra A Tantawi . Modeling differentiated services of multi-tier web applications. In: Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. 2006, 314−326
37
C, Lu Y, Lu T F, Abdelzaher J A, Stankovic S H Son . Feedback control architecture and design methodology for service delay guarantees in Web servers. IEEE Transactions on Parallel and Distributed Systems, 2006, 17( 9): 1014–1027
38
Zhang Y, Jiang J, Xu K, Nie X, Reed M J, Wang H, Yao G, Zhang M, Chen K. BDS: a centralized near-optimal overlay network for inter-datacenter data replication. In: Proceedings of the 13th EuroSys Conference. 2018, 10
39
Park J W, Tumanov A, Jiang A, Kozuch M A, Ganger G R. 3Sigma: distribution-based cluster scheduling for runtime uncertainty. In: Proceedings of the 13th EuroSys Conference. 2018, 2
40
L, Zheng Y, Yang A G Hauptmann . Person re-identification: past, present and future. 2016, arXiv preprint arXiv: 1610.02984
41
A, Gulati A, Merchant P J Varman . mClock: handling throughput variability for hypervisor IO scheduling. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 437−450
42
J, Li Y, Xia B, Li Z Zeng . A pseudo-dynamic search ant colony optimization algorithm with improved negative feedback mechanism. Cognitive Systems Research, 2020, 62: 1–9
43
Ahmad E S. Infrastructure as a service: a practical study of alibaba cloud elastic compute service (ECS)[J]. Tartous University-A Project, 2019.
44
GB/T 7714Axboe J. Fio-flexible i/o tester synthetic benchmark. URL, See github. com/axboe/fio website (Accessed: 2015-06-13), 2005
45
N, Mittal K, Garg A Ameria . A paper on modified round robin algorithm. International Journal of Latest Technology in Engineering, Management & Applied Science, 2015, 4( 11): 93–98
46
Mdtest hpc benchmark, available from the website of mdtest.sourceforge.net/