While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.
C D Antonopoulos, D S Nikolopoulos, T S Papatheodorou. Scheduling algorithms with bus bandwidth considerations for SMPs. In: Proceedings of 2003 International Conference on Parallel Processing, 2003. 2003, 547– 554
2
D Xu, C Wu, P C Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In: Proceedings of the 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). 2010, 237– 247
3
J Chang, G S Sohi. Cooperative cache partitioning for chip multiprocessors. In: Proceedings of ACM International Conference on Supercomputing 25th Anniversary Volume. 2007, 402– 412
4
Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. 2004, 111–122
5
M K Qureshi, Y N Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 2006, 423– 432
6
Lee G, Tolia N, Ranganathan P, Katz R H. Topology-aware resource allocation for data-intensive workloads. In: Proceedings of the 1st ACM Asia-Pacific Workshop on Workshop on Systems. 2010, 1–6
7
Q Lv, X Shi, L Zhou. Based on ant colony algorithm for cloud management platform resources scheduling. In: Proceedings of World Automation Congress 2012. 2012, 1– 4
8
Wen X, Huang M, Shi J. Study on resources scheduling based on ACO allgorithm and PSO algorithm in cloud computing. In: Proceedings of 2012 the 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science. 2012, 219–222
9
K Zhu, H Song, L Liu, J Gao, G Cheng. Hybrid genetic algorithm for cloud computing applications. In: Proceedings of 2011 IEEE Asia-Pacific Services Computing Conference. 2011, 182– 187
Koukis E, Koziris N. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs. In: Proceedings of the 12th International Conference on Parallel and Distributed Systems-(ICPADS’06). 2006, 10
12
F Pinel, J E Pecero, P Bouvry, S U Khan. Memory-aware green scheduling on multi-core processors. In: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops. 2010, 485– 488
13
H S Stone , J Turek , J L Wolf . Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41( 9): 1054– 1068
14
M Sato, I Kotera, R Egawa, H Takizawa, H Kobayashi. A cache-aware thread scheduling policy for multi-core processors. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. 2009, 109– 114
15
A Fedorova , S Blagodurov , S Zhuravlev . Managing contention for shared resources on multicore processors. Communications of the ACM, 2010, 53( 2): 49– 57
16
Q Chen, Z Huang, M Guo, J Zhou. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of 2011 International Conference on Parallel Processing. 2011, 722– 732
17
Q Chen, M Guo, Z Huang. CATS: Cache aware task-stealing based on online profiling in multi-socket multi-core architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing. 2012, 163– 172
18
Chen Q, Chen Y, Huang Z, Guo M. WATS: Workload-aware task scheduling in asymmetric multi-core architectures. In: Proceedings of 2012 IEEE the 26th International Parallel and Distributed Processing Symposium. 2012, 249–260
19
Q Chen, M Guo, H Guan. LAWS: locality-aware work-stealing for multi-socket multi-core architectures. In: Proceedings of the 28th ACM International Conference on Supercomputing. 2014, 3– 12
20
J Feliu , S Petit , J Sahuquillo , J Duato . Cache-hierarchy contention-aware scheduling in CMPs. IEEE Transactions on Parallel and Distributed Systems, 2014, 25( 3): 581– 590
21
J Mars, L Tang, R Hundt, K Skadron, M L Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 2011, 248– 259
22
H Yang , A Breslow , J Mars , L Tang . Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News, 2013, 41( 3): 607– 618
23
C Delimitrou , C Kozyrakis . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48( 4): 77– 88
24
C Delimitrou , C Kozyrakis . Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49( 4): 127– 144
25
D Lo, L Cheng, R Govindaraju, P Ranganathan, C Kozyrakis. Heracles: Improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. 2015, 450– 462
26
Chen S, Delimitrou C, Martínez J F. Parties: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 107–120
27
Dasari D, Andersson B, Nelis V, Petters S M, Easwaran A, Lee J. Response time analysis of cots-based multicores considering the contention on the shared memory bus. In: Proceedings of 2011 IEEE the 10th International Conference on Trust, Security and Privacy in Computing and Communications. 2011, 1068-1075
28
S A Rashid, G Nelissen, E Tovar. Cache persistence-aware memory bus contention analysis for multicore systems. In: Proceedings of 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2020, 442– 447
29
D Dasari , V Nelis , B Akesson . A framework for memory contention analysis in multi-core platforms. Real-Time Systems, 2016, 52( 3): 272– 322
30
Dasari D, Nelis V. An analysis of the impact of bus contention on the WCET in multicores. In: Proceedings of 2012 IEEE the 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. 2012, 1450-1457
31
Q Chen, S Xue, S Zhao, S Chen, Y Wu, Y Xu, Z Song, T Ma, Y Yang, M Guo. Alita: comprehensive performance isolation through bias resource management for public clouds. In: Proceedings of 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 2020, 1– 13
32
C Bienia, S Kumar, J P Singh, K Li. The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 2008, 72– 81
33
E Cortez, A Bonde, A Muzio, M Russinovich, M Fontoura, R Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of the 26th Symposium on Operating Systems Principles. 2017, 153– 167
34
X Zhang, X Zheng, Z Wang, Q Li, J Fu, Y Zhang, Y Shen. Fast and scalable VMM live upgrade in large cloud infrastructure. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 93– 105
35
H W Kuhn . The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 1955, 2( 1–2): 83– 97