Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (6) : 186505    https://doi.org/10.1007/s11704-023-3087-8
RESEARCH ARTICLE
Research on performance optimization of virtual data space across WAN
Jiantong HUO1,2,3, Zhisheng HUO1,2,3(), Limin XIAO1,2, Zhenxue HE4
1. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
3. High Performance Computing Center, Beihang University, Beijing 100191, China
4. Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China
 Download: PDF(15100 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

For the high-performance computing in a WAN environment, the geographical locations of national supercomputing centers are scattered and the network topology is complex, so it is difficult to form a unified view of resources. To aggregate the widely dispersed storage resources of national supercomputing centers in China, we have previously proposed a global virtual data space named GVDS in the project of “High Performance Computing Virtual Data Space”, a part of the National Key Research and Development Program of China. The GVDS enables large-scale applications of the high-performance computing to run efficiently across WAN. However, the applications running on the GVDS are often data-intensive, requiring large amounts of data from multiple supercomputing centers across WANs. In this regard, the GVDS suffers from performance bottlenecks in data migration and access across WANs. To solve the above-mentioned problem, this paper proposes a performance optimization framework of GVDS including the multitask-oriented data migration method and the request access-aware IO proxy resource allocation strategy. In a WAN environment, the framework proposed in this paper can make an efficient migration decision based on the amount of migrated data and the number of multiple data sources, guaranteeing lower average migration latency when multiple data migration tasks are running in parallel. In addition, it can ensure that the thread resource of the IO proxy node is fairly allocated among different types of requests (the IO proxy is a module of GVDS), so as to improve the application’s performance across WANs. The experimental results show that the framework can effectively reduce the average data access delay of GVDS while improving the performance of the application greatly.

Keywords storage aggregation across WANs      large-scale applications      GVDS      data migration      allocation of IO proxy resource     
Corresponding Author(s): Zhisheng HUO   
Just Accepted Date: 06 June 2023   Issue Date: 21 September 2023
 Cite this article:   
Jiantong HUO,Zhisheng HUO,Limin XIAO, et al. Research on performance optimization of virtual data space across WAN[J]. Front. Comput. Sci., 2024, 18(6): 186505.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3087-8
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I6/186505
Fig.1  Performance optimization framework
Fig.2  Architecture of MODM
ParameterMeaning
DCiAny supercomputing center i
MMigration task set
MreTimed task set that completed before the deadline
MbestOpportunistic task set for the copy data layout
mMigration task
MBDCiAmount of data migrated to DCi
MB_mintimeDCiMinimum delay of MBDCi migrated to DCi
MB_minBWDCiMinimum bandwidth of MBDCi migrated to DCi
Tab.1  Parameters of migration tasks
ParameterMeaning
DCiAny supercomputing center i
IOnodejAny IO proxy node j
BW(IOnodej)Maximum bandwidth of IO proxy node j
FBW(IOnodej)Maximum foreground bandwidth of IO proxy node j
BBW(IOnodej)Maximum background bandwidth of IO proxy node j
fbw(IOnodej)Used foreground bandwidth of IO proxy node j
bbw(IOnodej)Used background bandwidth of IO proxy node j
BWDCi?DCjMaximum bandwidth between DCi and DCj
FBWDCi?DCjMaximum foreground bandwidth between DCi and DCj
BBWDCi?DCjMaximum background bandwidth between DCi and DCj
fbwDCi?DCjUsed foreground bandwidth between DCi and DCj
bbwDCi?DCjUsed background bandwidth between DCi and DCj
Tab.2  Parameters of supercomputing center bandwidth
Fig.3  Allocation of IBAM
ParameterMeaning
TTth bandwidth allocation cycle
lMaximum bandwidth limit for each migration task
rMinimum bandwidth requirement for each migration task
pAmount of bandwidth allocated to migration task
LL label for sorting
RR label for sorting
PP label for sorting
syncUnitLargest data transmission unit when performing migration
tExecuting time of bandwidth allocation
MB_leftAmount of data remaining to be migrated in the migration task
Tab.3  Parameters of supercomputing center bandwidth
  
Fig.4  Implementation of MODM in the GVDS
Fig.5  Process of MODM
Fig.6  Design of RASS
ParameterMeaning
IOnodeIO proxy node
SData space, which is the basic data storage unit in the GVDS
FFile in the GVDS, which is the basic unit of user operations
BData block of file, which is logical organization of file
IOPS(S)IOPS to the data space in the cycle
MMapping relationship between data space and IO proxy node
Tab.4  Parameters of IO proxy node’s workload model
Fig.7  Mapping between data space and IO proxy node of AWPA
  
Fig.8  Mapping between data space and IO proxy node of RTDO
ParameterMeaning
RIO request set processed by the IO proxy node
RiIO request set with priority i
numNumber of IO requests
TTth cycle of thread resource allocation
latlimtMaximum queuing delay that a IO request can accept
latavgAverage queuing delay of the IO request calculated by sampling
diffDifference between maximum queuing delay and average queuing delay
TdNumber of threads in the thread pool of IO proxy node
TdlimtMinimum number of threads allocated to each type of request
TdpropExtra number of threads allocated to each type of request
Tab.5  Parameters of RTDO
  
Fig.9  Implementation of AWPA and RTDO in the GVDS
Fig.10  Process of RASS
Hardware and softwareConfiguration
Operating systemUbuntu 18.04 64 bits (better support the GVDS)
CPUecs.g7.large (2 cores)
Memory8 GB
HDD40 GB
Kernel version4.15
WAN bandwidth100 Mbps
GVDS version1.6.9 (GVDS including MODM and RAAS)
Tab.6  Hardware and software configurations
Fig.11  Network topology of MODM for performance
Fig.12  Comparison of data migration method. (a) Total running time of all migration tasks; (b) bandwidth of all migration tasks
Fig.13  Network topology of MODM for fairness
Fig.14  Comparison of multiple migration tasks’ bandwidth allocation
Fig.15  Network topology of AWPA
Fig.16  Comparison of read and write access bandwidth. (a) Bandwidth comparison of read; (b) bandwidth comparison of write
Fig.17  Network topology of RTDO
Fig.18  IOPS comparison of create and remove. (a) IOPS comparison of create; (b) IOPS comparison of remove
Fig.19  IOPS comparison of stat
Fig.20  Experimental deployment of GVDS
Supercomputing center# of serversStorage capacity/TB# of cores
CAS2266.44112 cores
SSC41208.121136 cores
NSCC-JN487.4156 cores
NSCC-GZ41.45192 cores
NCSS-CS2715.7564 cores
Tab.7  Experimental configuration
Supercomputing centerBUAA/(MB·s?1)NSCC-JN/(MB·s?1)SSC/(MB·s?1)NSCC-GZ/(MB·s?1)NSCC-CS/(MB·s?1)CAS/(MB·s?1)
BUAA?31.2 ± 26.517.6± 15.010.7± 0.67.4± 3.0102.7± 14.6
NSCC-JN31.2±26.5?33.5±20.03.9±2.53.3±6.9102.2±21.7
SSC17.6±15.033.5±20.0?8.9±2.924.9±7.567.8±19.4
NSCC-GZ10.7±0.63.9±2.58.9±2.9?3.8±2.79.6±0.8
NSCC-CS7.4±3.03.3±6.924.9±7.53.8±2.7?5.9±0.9
CAS102.7±14.6102.2±21.767.8±19.49.6±0.85.9±0.9?
Tab.8  Network bandwidth between any two supercomputing centers
Fig.21  Network bandwidth status between supercomputing centers
Fig.22  Comparison of Reid between the original GVDS and the extended GVDS. (a) Comparison of subtask running time; (b) comparison of total running time
Size of data block/KBSequential read bandwidth (NSCC CS to GZ: MB/s)Sequential read bandwidth (NSCC JN to GZ: MB/s)Bandwidth utilization/%
83.5543.52691.13
163.6083.5992.65
323.6593.61493.62
643.6123.63693.28
1283.6433.53892.44
2563.6063.46391.02
5123.7013.51792.94
10243.7293.58794.19
20483.5983.56192.14
40963.6983.61394.12
81923.6663.66194.31
Average3.6433.57392.90
Tab.9  Bandwidth utilization for sequential read
Size of data block/KBSequential write bandwidth (NSCC CS to GZ: MB/s)Sequential write bandwidth (NSCC JN to GZ: MB/s)Bandwidth utilization/%
83.763.76496.84
163.7363.69295.62
323.4333.73692.19
643.7343.79896.93
1283.6873.81196.46
2563.7723.81197.59
5123.6473.78295.59
10243.7933.80597.78
20483.683.67394.65
40963.6463.68394.32
81923.6243.78795.35
Average3.763.76495.76
Tab.10  Bandwidth utilization for sequential write
  
  
  
  
1 L M, Xiao Y, Song G J, Qin H J, Zhou C B, Wang B, Wei W, Wei Z S Huo . GVDS: a global virtual data space for wide-area high-performance computing environments. Big Data Research, 2021, 7( 2): 123–146
2 O, Tatebe K, Hiraga N Soda . Gfarm grid file system. New Generation Computing, 2010, 28( 3): 257–275
3 A, Thomson D J Abadi . CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies. 2015, 1−14
4 M, Wrzeszcz K, Trzepla R, Słota K, Zemek T, Lichoń Ł, Opioła D, Nikolow Ł, Dutka R, Słota J Kitowski . Metadata organization and management for globalization of data access with onedata. In: Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics. 2015, 312−321
5 ZENG, Rong HOU, Xiaofeng ZHANG, Lu LI, Chao ZHENG, Wenli GUO Minyi . Performance optimization for cloud computing systems in the microservice era: state-of-the-art and research opportunities. Frontiers of Computer Science, 2022, 16( 6): 166106
6 X, Ji B, Yang T, Zhang X, Ma X, Zhu X, Wang N, El-Sayed J, Zhai W, Liu W Xue . Automatic, Application-Aware I/O forwarding resource allocation. In: Proceedings of the 17th USENIX Conference on File and Storage Technologies. 2019, 265−279
7 Y, Song L, Xiao L, Wang G, Qin B, Wei B, Yan C Zhang . GCSS: a global collaborative scheduling strategy for wide-area high-performance computing. Frontiers of Computer Science, 2022, 16( 5): 165105
8 J, Huo Y, Xu Z, Huo L, Xiao Z He . Research on key technologies of edge cache in virtual data space across WAN. Frontiers of Computer Science, 2023, 17( 1): 171102
9 I, Gog M, Schwarzkopf A, Gleave R N M, Watson S Hand . Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. 2016, 99−115
10 A V Goldberg . An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 1997, 22( 1): 1–29
11 KE, Changbo XIAO, Fu HUANG, Zhiqiu XIAO Fangxiong . A user requirements-oriented privacy policy self-adaption scheme in cloud computing. Frontiers of Computer Science, 2023, 17( 2): 172203
12 E, Boutin J, Ekanayake W, Lin B, Shi J, Zhou Z, Qian M, Wu L Zhou . Apollo: scalable and coordinated scheduling for Cloud-Scale computing. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 2014, 285−300
13 Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the 6th ACM Symposium on Cloud Computing. 2015, 97−110
14 A W, Richa M, Mitzenmacher R Sitaraman . The power of two random choices: a survey of techniques and results. Combinatorial Optimization, 2001, 9: 255–304
15 J, Dean S Ghemawat . MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51( 1): 107–113
16 M, Isard M, Budiu Y, Yu A, Birrell D Fetterly . Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. 2007, 59−72
17 HOU, Zhengxiong SHEN, Hong ZHOU, Xingshe GU,Yunlan WANG, Jianhua ZHAO Tianhai . Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions. Frontiers of Computer Science, 2022, 16( 5): 165107
18 XUE, Shuai ZHAO, Shang CHEN, Quan SONG, Zhuo CHEN, Shanpei MA, Tao YANG, Yong ZHENG, Wenli GUO Minyi . Kronos: towards bus contention-aware job scheduling in warehouse scale computers. Frontiers of Computer Science, 2023, 17( 1): 171101
19 M, Schwarzkopf A, Konwinski M, Abd-El-Malek J Wilkes . Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems. 2013, 351−364
20 C Carrión . Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys, 2023, 55( 7): 138
21 G Park . A generalization of multiple choice balls-into-bins. In: Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. 2011, 297−298
22 Chang H S, Givan R, Chong E K P. On-line scheduling via sampling. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems. 2000, 62−71
23 Dong X, Wang Y, Liao H. Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 9−16
24 Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 69–84
25 C, Delimitrou C Kozyrakis . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48( 4): 77–88
26 C, Delimitrou C Kozyrakis . Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49( 4): 127–144
27 Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems. 2015, 18
28 Tumanov A, Zhu T, Park J W, Kozuch M A, Harchol-Balter M, Ganger G R. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In: Proceedings of the 11th European Conference on Computer Systems. 2016, 35
29 W, Khallouli J Huang . Cluster resource scheduling in cloud computing: literature review and research challenges. The Journal of Supercomputing, 2022, 78( 5): 6898–6943
30 C, Curino D E, Difallah C, Douglas S, Krishnan R, Ramakrishnan S Rao . Reservation-based scheduling: if you’re late don’t blame us! In: Proceedings of the ACM Symposium on Cloud Computing. 2014, 1−14
31 Wang Z, Zhang G, Wang Y, Yang Q, Zhu J. Dayu: fast and low-interference data recovery in very-large storage systems. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 993−1007
32 Ongaro D, Rumble S M, Stutsman R, Ousterhout J, Rosenblum M. Fast crash recovery in RAMCloud. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011, 29−41
33 F, Chang J, Dean S, Ghemawat W C, Hsieh D A, Wallach M, Burrows T, Chandra A, Fikes R E Gruber . Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems, 2008, 26( 2): 4
34 M, Chowdhury M, Zaharia J, Ma M I, Jordan I Stoica . Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review, 2011, 41( 4): 98–109
35 X, He B, Yang J, Gao W, Xiao Q, Chen S, Shi D, Chen W, Liu W, Xue Z Chen . HadaFS: a file system bridging the local and shared burst buffer for exascale supercomputers. In: Proceedings of the 21st USENIX Conference on File and Storage Technologies. 2023, 215−230
36 Y, Diao J L, Hellerstein S, Parekh H, Shaikh M, Surendra A Tantawi . Modeling differentiated services of multi-tier web applications. In: Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. 2006, 314−326
37 C, Lu Y, Lu T F, Abdelzaher J A, Stankovic S H Son . Feedback control architecture and design methodology for service delay guarantees in Web servers. IEEE Transactions on Parallel and Distributed Systems, 2006, 17( 9): 1014–1027
38 Zhang Y, Jiang J, Xu K, Nie X, Reed M J, Wang H, Yao G, Zhang M, Chen K. BDS: a centralized near-optimal overlay network for inter-datacenter data replication. In: Proceedings of the 13th EuroSys Conference. 2018, 10
39 Park J W, Tumanov A, Jiang A, Kozuch M A, Ganger G R. 3Sigma: distribution-based cluster scheduling for runtime uncertainty. In: Proceedings of the 13th EuroSys Conference. 2018, 2
40 L, Zheng Y, Yang A G Hauptmann . Person re-identification: past, present and future. 2016, arXiv preprint arXiv: 1610.02984
41 A, Gulati A, Merchant P J Varman . mClock: handling throughput variability for hypervisor IO scheduling. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 437−450
42 J, Li Y, Xia B, Li Z Zeng . A pseudo-dynamic search ant colony optimization algorithm with improved negative feedback mechanism. Cognitive Systems Research, 2020, 62: 1–9
43 Ahmad E S. Infrastructure as a service: a practical study of alibaba cloud elastic compute service (ECS)[J]. Tartous University-A Project, 2019.
44 GB/T 7714Axboe J. Fio-flexible i/o tester synthetic benchmark. URL, See github. com/axboe/fio website (Accessed: 2015-06-13), 2005
45 N, Mittal K, Garg A Ameria . A paper on modified round robin algorithm. International Journal of Latest Technology in Engineering, Management & Applied Science, 2015, 4( 11): 93–98
46 Mdtest hpc benchmark, available from the website of mdtest.sourceforge.net/
[1] FCS-23087-OF-JH_suppl_1 Download
[1] Ye CHI, Jianhui YUE, Xiaofei LIAO, Haikun LIU, Hai JIN. A hybrid memory architecture supporting fine-grained data migration[J]. Front. Comput. Sci., 2024, 18(2): 182103-.
[2] ZENG Lingfang, FENG Dan, JIANG Hong. High TPO/TCO for data storage: policy, algorithm and early practice[J]. Front. Comput. Sci., 2007, 1(3): 349-360.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed