Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (1) : 171101    https://doi.org/10.1007/s11704-021-0418-5
RESEARCH ARTICLE
Kronos: towards bus contention-aware job scheduling in warehouse scale computers
Shuai XUE1,2, Shang ZHAO1,2, Quan CHEN1,2(), Zhuo SONG2, Shanpei CHEN2, Tao MA2, Yong YANG2, Wenli ZHENG1, Minyi GUO1
1. Shanghai Jiao Tong University, Shanghai 200240, China
2. Alibaba Group, Hangzhou 311121, China
 Download: PDF(5388 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.

Keywords bus contention      split lock      schedule      high performance      cloud     
Corresponding Author(s): Quan CHEN   
Just Accepted Date: 26 January 2021   Issue Date: 01 March 2022
 Cite this article:   
Shuai XUE,Shang ZHAO,Quan CHEN, et al. Kronos: towards bus contention-aware job scheduling in warehouse scale computers[J]. Front. Comput. Sci., 2023, 17(1): 171101.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-021-0418-5
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I1/171101
Fig.1  Bus contention between the co-located VMs
Fig.2  Split lock of Autodesk Maya and 3ds Max. (a) Autodesk 3ds Max; (b) Autodesk Maya
Fig.3  
Specifications
Hardware CPU: Intel Xeon Platinum 8163 CPU@2.50GHzMicroarchitecture codename: Skylake-SPL1d cache:32KB; L1i cache:32KB;L2 cache:1024KB; L3 cache:32MBMemory: 6*32GB DDR4-2666 DIMMs
Software CentOS 7 x86 64 (Kernel 4.10)
Peak memory bandwidth 256 GB/s per socket
Peak split lock frequency 260,000 locks/s
Benchmark Jobs
PARSEC bodytrack(bt), canneal(cn), dedup(du), x264(x2), ferret(fr),fluidanimate(fa), raytrace(rt), streamcluster (sc), vips (vs), facesim (fs)
Tab.1  Investigation configuration
Fig.4  The performance of the benchmarks when they are co-located with the bus polluter and the corresponding memory bandwidth usage at co-location
Fig.5  The impact of split lock on the data access latency from L1 cache, L2 cache, L3 cache and the main memory
Fig.6  The latency of accessing data from the main memory with different bus contention
Environment CPU platform Idle latency /ns Rate
No split lock Peak split lock
Local Intel Xeon 8163 118.4 959.7 8.10
AWS m5.xlarge Intel Xeon 8259CL 90.8 782.6 8.61
Azure D4s_v4 Intel Xeon 8272CL 84.5 573.2 6.78
Azure D4as_v4 AMD EPYC 7252 116.5 1557.8 13.3
Tab.2  Memory idle access latency for public clouds
Fig.7  The design architecture of Kronos
Fig.8  Measuring the tolerance of a VM on the memory bus contention
Fig.9  Accuracy of the bus contention tolerance meter
Fig.10  
Fig.11  The relationship between the split lock frequency of one node and the probability that the node can place a job
Fig.12  The placement priority of scenario I
Fig.13  
Fig.14  The placement priority of scenario II. (a) There are nodes where the frequency of split lock is below themaximum tolerance frequency; (b) The split lock frequency of all nodes is higher than maximum tolerance frequency
Fig.15  
Scenario Scale # of Nodes, # of Jobs
I Small 30, 30
Medium 60, 60
Large 120,120
II Small 30,120
Medium 60,240
Large 120,480
III / 30, 30
Benchmark Parsec [32] Mircobench (split locks/second)
Jobs The 10 jobs in Table 1 Lock1 (20K), Lock2 (40K), Lock3 (60K), Lock4 (90K), Lock5 (120K)
Tab.3  Experimental configuration
Fig.16  The number of required physical nodes to fulfill the SLA of all the jobs in 100 test cases (Scenario I)
Fig.17  Job placement on each node in one test case with Kronos and First-Fit for minimizing node usage (Scenario I)
Fig.18  The number of jobs that suffer from SLA violation in 100 test cases (Scenario II)
Fig.19  The number of jobs that suffer from SLA violation on each node in one test case (Scenario II)
Fig.20  The overall running time performance in 100 test cases (Scenario III)
Fig.21  The running time of each job in one test case (Scenario III)
Fig.22  The scalability of Kronos. Small, Medium, and Large represent the scales of each scenario summarized in Table 3. (a) Scenario I; (b) Scenario II
  
  
  
  
  
  
  
  
  
1 C D Antonopoulos, D S Nikolopoulos, T S Papatheodorou. Scheduling algorithms with bus bandwidth considerations for SMPs. In: Proceedings of 2003 International Conference on Parallel Processing, 2003. 2003, 547– 554
2 D Xu, C Wu, P C Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In: Proceedings of the 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). 2010, 237– 247
3 J Chang, G S Sohi. Cooperative cache partitioning for chip multiprocessors. In: Proceedings of ACM International Conference on Supercomputing 25th Anniversary Volume. 2007, 402– 412
4 Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. 2004, 111–122
5 M K Qureshi, Y N Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 2006, 423– 432
6 Lee G, Tolia N, Ranganathan P, Katz R H. Topology-aware resource allocation for data-intensive workloads. In: Proceedings of the 1st ACM Asia-Pacific Workshop on Workshop on Systems. 2010, 1–6
7 Q Lv, X Shi, L Zhou. Based on ant colony algorithm for cloud management platform resources scheduling. In: Proceedings of World Automation Congress 2012. 2012, 1– 4
8 Wen X, Huang M, Shi J. Study on resources scheduling based on ACO allgorithm and PSO algorithm in cloud computing. In: Proceedings of 2012 the 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science. 2012, 219–222
9 K Zhu, H Song, L Liu, J Gao, G Cheng. Hybrid genetic algorithm for cloud computing applications. In: Proceedings of 2011 IEEE Asia-Pacific Services Computing Conference. 2011, 182– 187
10 Enable split locked accesses detection.see lwn.net/ Articles/784864/ website, 2020
11 Koukis E, Koziris N. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs. In: Proceedings of the 12th International Conference on Parallel and Distributed Systems-(ICPADS’06). 2006, 10
12 F Pinel, J E Pecero, P Bouvry, S U Khan. Memory-aware green scheduling on multi-core processors. In: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops. 2010, 485– 488
13 H S Stone , J Turek , J L Wolf . Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41( 9): 1054– 1068
14 M Sato, I Kotera, R Egawa, H Takizawa, H Kobayashi. A cache-aware thread scheduling policy for multi-core processors. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. 2009, 109– 114
15 A Fedorova , S Blagodurov , S Zhuravlev . Managing contention for shared resources on multicore processors. Communications of the ACM, 2010, 53( 2): 49– 57
16 Q Chen, Z Huang, M Guo, J Zhou. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of 2011 International Conference on Parallel Processing. 2011, 722– 732
17 Q Chen, M Guo, Z Huang. CATS: Cache aware task-stealing based on online profiling in multi-socket multi-core architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing. 2012, 163– 172
18 Chen Q, Chen Y, Huang Z, Guo M. WATS: Workload-aware task scheduling in asymmetric multi-core architectures. In: Proceedings of 2012 IEEE the 26th International Parallel and Distributed Processing Symposium. 2012, 249–260
19 Q Chen, M Guo, H Guan. LAWS: locality-aware work-stealing for multi-socket multi-core architectures. In: Proceedings of the 28th ACM International Conference on Supercomputing. 2014, 3– 12
20 J Feliu , S Petit , J Sahuquillo , J Duato . Cache-hierarchy contention-aware scheduling in CMPs. IEEE Transactions on Parallel and Distributed Systems, 2014, 25( 3): 581– 590
21 J Mars, L Tang, R Hundt, K Skadron, M L Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 2011, 248– 259
22 H Yang , A Breslow , J Mars , L Tang . Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News, 2013, 41( 3): 607– 618
23 C Delimitrou , C Kozyrakis . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48( 4): 77– 88
24 C Delimitrou , C Kozyrakis . Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49( 4): 127– 144
25 D Lo, L Cheng, R Govindaraju, P Ranganathan, C Kozyrakis. Heracles: Improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. 2015, 450– 462
26 Chen S, Delimitrou C, Martínez J F. Parties: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 107–120
27 Dasari D, Andersson B, Nelis V, Petters S M, Easwaran A, Lee J. Response time analysis of cots-based multicores considering the contention on the shared memory bus. In: Proceedings of 2011 IEEE the 10th International Conference on Trust, Security and Privacy in Computing and Communications. 2011, 1068-1075
28 S A Rashid, G Nelissen, E Tovar. Cache persistence-aware memory bus contention analysis for multicore systems. In: Proceedings of 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2020, 442– 447
29 D Dasari , V Nelis , B Akesson . A framework for memory contention analysis in multi-core platforms. Real-Time Systems, 2016, 52( 3): 272– 322
30 Dasari D, Nelis V. An analysis of the impact of bus contention on the WCET in multicores. In: Proceedings of 2012 IEEE the 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. 2012, 1450-1457
31 Q Chen, S Xue, S Zhao, S Chen, Y Wu, Y Xu, Z Song, T Ma, Y Yang, M Guo. Alita: comprehensive performance isolation through bias resource management for public clouds. In: Proceedings of 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 2020, 1– 13
32 C Bienia, S Kumar, J P Singh, K Li. The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 2008, 72– 81
33 E Cortez, A Bonde, A Muzio, M Russinovich, M Fontoura, R Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of the 26th Symposium on Operating Systems Principles. 2017, 153– 167
34 X Zhang, X Zheng, Z Wang, Q Li, J Fu, Y Zhang, Y Shen. Fast and scalable VMM live upgrade in large cloud infrastructure. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 93– 105
35 H W Kuhn . The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 1955, 2( 1–2): 83– 97
[1] Kun WANG, Song WU, Shengbang LI, Zhuo HUANG, Hao FAN, Chen YU, Hai JIN. Precise control of page cache for containers[J]. Front. Comput. Sci., 2024, 18(2): 182102-.
[2] Xingxin LI, Youwen ZHU, Rui XU, Jian WANG, Yushu ZHANG. Indexing dynamic encrypted database in cloud for efficient secure k-nearest neighbor query[J]. Front. Comput. Sci., 2024, 18(1): 181803-.
[3] Ashish SINGH, Abhinav KUMAR, Suyel NAMASUDRA. DNACDS: Cloud IoE big data security and accessing scheme based on DNA cryptography[J]. Front. Comput. Sci., 2024, 18(1): 181801-.
[4] Jianwei LI, Xiaoming WANG, Qingqing GAN. SEOT: Secure dynamic searchable encryption with outsourced ownership transfer[J]. Front. Comput. Sci., 2023, 17(5): 175812-.
[5] Jinyang GUO, Lu ZHANG, José ROMERO HUNG, Chao LI, Jieru ZHAO, Minyi GUO. FPGA sharing in the cloud: a comprehensive analysis[J]. Front. Comput. Sci., 2023, 17(5): 175106-.
[6] Sedigheh KHOSHNEVIS. A search-based identification of variable microservices for enterprise SaaS[J]. Front. Comput. Sci., 2023, 17(3): 173208-.
[7] Changbo KE, Fu XIAO, Zhiqiu HUANG, Fangxiong XIAO. A user requirements-oriented privacy policy self-adaption scheme in cloud computing[J]. Front. Comput. Sci., 2023, 17(2): 172203-.
[8] Wei SHI, Dan TANG, Sijia ZHAN, Zheng QIN, Xiyin WANG. An approach for detecting LDoS attack based on cloud model[J]. Front. Comput. Sci., 2022, 16(6): 166821-.
[9] Qingqing GAN, Joseph K. LIU, Xiaoming WANG, Xingliang YUAN, Shi-Feng SUN, Daxin HUANG, Cong ZUO, Jianfeng WANG. Verifiable searchable symmetric encryption for conjunctive keyword queries in cloud storage[J]. Front. Comput. Sci., 2022, 16(6): 166820-.
[10] Rong ZENG, Xiaofeng HOU, Lu ZHANG, Chao LI, Wenli ZHENG, Minyi GUO. Performance optimization for cloud computing systems in the microservice era: state-of-the-art and research opportunities[J]. Front. Comput. Sci., 2022, 16(6): 166106-.
[11] Donghui WANG, Peng CAI, Weining QIAN, Aoying ZHOU. Efficient and stable quorum-based log replication and replay for modern cluster-databases[J]. Front. Comput. Sci., 2022, 16(5): 165612-.
[12] Hongbin XU, Weili YANG, Qiuxia WU, Wenxiong KANG. Endowing rotation invariance for 3D finger shape and vein verification[J]. Front. Comput. Sci., 2022, 16(5): 165332-.
[13] Zhengxiong HOU, Hong SHEN, Xingshe ZHOU, Jianhua GU, Yunlan WANG, Tianhai ZHAO. Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions[J]. Front. Comput. Sci., 2022, 16(5): 165107-.
[14] Bowen ZHAO, Shaohua TANG, Ximeng LIU, Yiming WU. Return just your search: privacy-preserving homoglyph search for arbitrary languages[J]. Front. Comput. Sci., 2022, 16(2): 162801-.
[15] Zhangjie FU, Yan WANG, Xingming SUN, Xiaosong ZHANG. Semantic and secure search over encrypted outsourcing cloud based on BERT[J]. Front. Comput. Sci., 2022, 16(2): 162802-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed