Computer comparisons in the presence of performance variation

doi:10.1007/s11704-018-7319-2

Front. Comput. Sci.

2020, Vol. 14

Issue (1) : 21-41 https://doi.org/10.1007/s11704-018-7319-2

RESEARCH ARTICLE

Computer comparisons in the presence of performance variation

Samuel IRVING^1,², Bin LI¹, Shaoming CHEN¹, Lu PENG¹, Weihua ZHANG^2,^3,⁴(

), Lide DUAN⁵

¹. Louisiana State University, Baton Rouge, LA 70803, USA
². Shanghai Institute of Intelligent Electronics & Systems, Shanghai 201203, China
³. Software School, Fudan University, Shanghai 201203, China
⁴. Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 200433, China
⁵. University of Texas at San Antonio, San Antonio, TX 78249, USA

Download: PDF(1086 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Performance variability, stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that for both PARSEC and highvariance BigDataBench benchmarks 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g., ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance. We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines. We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5. Finally, we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples.

Keywords performance of systems variation performance attributes measurement evaluation modeling simulation of multiple-processor systems experimental design Big Data

Corresponding Author(s): Weihua ZHANG

Just Accepted Date: 12 February 2018 Online First Date: 07 January 2019 Issue Date: 24 September 2019

Cite this article:

Samuel IRVING,Bin LI,Shaoming CHEN, et al. Computer comparisons in the presence of performance variation[J]. Front. Comput. Sci., 2020, 14(1): 21-41.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7319-2
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I1/21

1	A R Alameldeen, D A Wood. Variability in architectural simulations of multi-threaded workloads. In: Proceedings of the 9th IEEE International Symposium on High Performance Computer Architecture. 2003, 7–18
2	A George, D Buytaer, L Eeckhout. Statistically rigorous java performance evaluation. ACM SIGPLAN Notices, 2007, 42(10): 57–76 https://doi.org/10.1145/1297105.1297033
3	T Mytkowicz, A Diwan, M Hauswirth, P F Sweeney. Producing wrong data without doing anything obviously wrong. In: Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2009, 265–276 https://doi.org/10.1145/1508244.1508275
4	S Krishnamurthi, J Vitek. The real software crisis: repeatability as a core value. Communications of ACM, 2015, 58(3): 34–36 https://doi.org/10.1145/2658987
5	T Chen, Q Guo, O Temam, Y Wu, Y Bao, Z Xu, Y Chen. Statistical performance comparisons of computers. IEEE Transactions on Computers, 2015, 64(5): 1442–1455 https://doi.org/10.1109/TC.2014.2315614
6	R J Freund, D Mohr, W J Wilson. Statistical Methods. 3rd ed. London: Academic Press, 2010
7	T Chen, Y Chen, Q Guo, O Temam, Y Wu, W Hu. Statistical performance comparisons of computers. In: Proceedings of the 18th IEEE International Symposium On High Performance Computer Architecture. 2012, 1–12 https://doi.org/10.1109/HPCA.2012.6169043
8	M Hollander, D A Wolfe. Nonparametric Statistical Methods. 2nd ed. New York: John Wiley & Sons, 1999
9	D Moore, G P McCabe, B Craig. Introduction to the Practice of Statistics. 7th ed. New York: W. H. Freeman Press, 2010
10	E S Edgington. Randomization Tests. 3rd ed. New York: Marcel- Dekker, 1995
11	A C Davison, D V Hinkley. Bootstrap Methods and Their Application. New York: Cambridge University Press, 1997 https://doi.org/10.1017/CBO9780511802843
12	L Wang, J Zhan, C Luo, Y Zhu, Q Yang, Y He. Bigdatabench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High-Performance Computer Architecture. 2014, 488–499 https://doi.org/10.1109/HPCA.2014.6835958
13	J C Gower, S G Lubbe, N L Roux. Understanding Biplots. Hoboken: John Wiley & Sons, 2011 https://doi.org/10.1002/9780470973196
14	B Efron, R J Tibshirani. An Introduction to the Bootstrap. New York: Chapman and Hall/CRC, 1994
15	P J Fleming, J J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 1986, 29(3): 218–221 https://doi.org/10.1145/5666.5673
16	R A Johnson. Statistics: Principles and Methods. 6th ed. New York: John Wiley & Sons, 2009
17	C Bienia, S Kumar, J P Singh, K Li. The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 2008, 72–81 https://doi.org/10.1145/1454115.1454128
18	D Citron, A Hurani, A Gnadrey. The harmonic or geometric mean: does it really matter? ACM SIGARCH Computer Architecture News, 2006, 34(4): 18–25 https://doi.org/10.1145/1186736.1186738
19	M F Iqbal, L K John. Confusion by all means. In: Proceedings of the 6th International Workshop on Unique Chips and Systems. 2010, 1–6
20	J R Mashey. War of the benchmark means: time for a truce. ACM SIGARCH Computer Architecture News, 2004, 32(4): 1–14 https://doi.org/10.1145/1040136.1040137
21	J L Hennessy, D A Patterson. Computer Architecture: A Quantitative Approach. 4th ed. Walthan: Morgan Kaufmann, 2007
22	L Eeckhout. Computer Architecture Performance Evaluation Methods. California: Morgan & Claypool Press, 2010
23	D J Lilja. Measuring Computer Performance: A Practitioner’s Guide. New York: Cambridge University Press, 2000 https://doi.org/10.1017/CBO9780511612398
24	A Oliveira, S Fischmeister, A Diwan, M Hauswirth, P F Sweeney. Why you should care about quantile regression. In: Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2013, 207–218 https://doi.org/10.1145/2451116.2451140
25	S Patil, D J Lilja. Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. IEEE Computer Architecture Letters, 2010, 9(1): 1–4 https://doi.org/10.1109/L-CA.2010.1
26	A Iosup, N Yigitbasi, D H J Epema. On the performance variability of production cloud services. In: Proceedings of IEEE/ACMInternational Symposium on Cluster, Cloud and Grid Computing, Newport Beach. 2011, 104–113 https://doi.org/10.1109/CCGrid.2011.22
27	P Leitner, J Cito. Patterns in the chaos—a study of performance variation and predictability in public IaaS clouds. ACM Transactions on Internet Technology, 2016, 16(3): 15 https://doi.org/10.1145/2885497
28	W Zhang, X Ji, B Song, S Yu, H Chen, T Li, P Yew, W Zhao. Varcatcher: a pramework for tackling performance variability of parallel workloads on multi-core. IEEE Transactions on Parallel and Distributed Systems, 2016, 28: 1215–1228 https://doi.org/10.1109/TPDS.2016.2613524
29	K K Pusukuri, R Gupta, A N Bhuyan. Thread tranquilizer: dynamically reducing performance variation. ACM Transactions on Architecture and Code Optimization, 2012, 8(4): 46–66 https://doi.org/10.1145/2086696.2086725
30	I Jimenez, C Maltzahn, J Lofstead, A Moody, K Mohror, R Arpaci-Dusseau, A Arpaci-Dusseau. Characterizing and reducing crossplatform performance variability using OS-level virtualization. In: Proceedings of the 1st IEEE International Workshop on Variability in Parallel and Distributed Systems. 2016, 1077–1080

[1]

Article highlights

Download

[1]	Zhumin CHEN, Xueqi CHENG, Shoubin DONG, Zhicheng DOU, Jiafeng GUO, Xuanjing HUANG, Yanyan LAN, Chenliang LI, Ru LI, Tie-Yan LIU, Yiqun LIU, Jun MA, Bing QIN, Mingwen WANG, Jirong WEN, Jun XU, Min ZHANG, Peng ZHANG, Qi ZHANG. Information retrieval: a view from the Chinese IR community[J]. Front. Comput. Sci., 2021, 15(1): 151601-.
[2]	Zhenghui HU, Wenjun WU, Jie LUO, Xin WANG, Boshu LI. Quality assessment in competition-based software crowdsourcing[J]. Front. Comput. Sci., 2020, 14(6): 146207-.
[3]	Jiangfan LI, Chendie YAO, Junxu XIA, Deke GUO. Guaranteeing the response deadline for general aggregation trees[J]. Front. Comput. Sci., 2020, 14(6): 146504-.
[4]	Xiaochen LIU, Chunhe XIA, Tianbo WANG, Li ZHONG, Xiaojian LI. A behavior-aware SLA-based framework for guaranteeing the security conformance of cloud service[J]. Front. Comput. Sci., 2020, 14(6): 146808-.
[5]	Chunxi ZHANG, Yuming LI, Rong ZHANG, Weining QIAN, Aoying ZHOU. Benchmarking on intensive transaction processing[J]. Front. Comput. Sci., 2020, 14(5): 145204-.
[6]	Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[7]	Ling SHEN, Richang HONG, Yanbin HAO. Advance on large scale near-duplicate video retrieval[J]. Front. Comput. Sci., 2020, 14(5): 145702-.
[8]	Di MA, Songcan CHEN. Bayesian compressive principal component analysis[J]. Front. Comput. Sci., 2020, 14(4): 144303-.
[9]	Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing[J]. Front. Comput. Sci., 2020, 14(4): 144607-.
[10]	Muhammad Aminur RAHAMAN, Mahmood JASIM, Md. Haider ALI, Md. HASANUZZAMAN. Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language[J]. Front. Comput. Sci., 2020, 14(3): 143302-.
[11]	Xingyue CHEN, Tao SHANG, Feng ZHANG, Jianwei LIU, Zhenyu GUAN. Dynamic data auditing scheme for big data storage[J]. Front. Comput. Sci., 2020, 14(1): 219-229.
[12]	Thierry GAUTIER, Clément GUY, Alexandre HONORAT, Paul LE GUERNIC, Jean-Pierre TALPIN, Loïc BESNARD. Polychronous automata and their use for formal validation of AADL models[J]. Front. Comput. Sci., 2019, 13(4): 677-697.
[13]	Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[14]	Yu HONG, Kai WANG, Weiyi GE, Yingying QIU, Guodong ZHOU. Cursor momentum for fascination measurement[J]. Front. Comput. Sci., 2019, 13(2): 396-412.
[15]	Peng PENG, Lei ZOU, Zhenqin DU, Dongyan ZHAO. Using partial evaluation in holistic subgraph search[J]. Front. Comput. Sci., 2018, 12(5): 966-983.

Viewed

Full text

Abstract

Cited

Shared

Discussed