|
|
CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications |
Chunjie LUO1, Jianfeng ZHAN1( ), Zhen JIA1, Lei WANG1, Gang LU1, Lixin ZHANG1, Cheng-Zhong XU2,3, Ninghui SUN1 |
1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100019, China; 2. Department of Electrical and Computer Engineering,Wayne State University, Detroit, MI 48202, USA; 3. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China |
|
|
Abstract With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing systems that are shared for running big data applications.We analyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, diversity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experiments, we demonstrate the advantages of our proposed metrics. In several case studies, we evaluate two small-scale deployments of cloud computing systems using CloudRank-D.
|
Keywords
data center systems
clouds
big data applications
benchmarks
evaluation metrics
|
Corresponding Author(s):
ZHAN Jianfeng,Email:zhanjianfeng@ict.ac.cn
|
Issue Date: 01 August 2012
|
|
1 |
Armbrust M, Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. Above the clouds: a Berkeley view of cloud computing. Deptartment Electrical Engineering and Compututer Sciences, University of California, Berkeley, Report UCB/EECS, 2009, 28
|
2 |
Barroso L, H?lzle U. The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture , 2009, 4(1): 1-108 doi: 10.2200/S00193ED1V01Y200905CAC006
|
3 |
http://wiki.apache.org/hadoop/PoweredBy
|
4 |
Wang P, Meng D, Han J, Zhan J, Tu B, Shi X, Wan L. Transformer: a new paradigm for building data-parallel programming models. IEEE Micro , 2010, 30(4): 55-64 doi: 10.1109/MM.2010.75
|
5 |
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59-72 doi: 10.1145/1272998.1273005
|
6 |
Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sen Sarma J, Murthy R, Liu H. Data warehousing and analytics infrastructure at Facebook. In: Proceedings of the 2010 International Conference on Management of Data . 2010, 1013-1020
|
7 |
Dongarra J, Luszczek P, Petitet A. The linpack benchmark: past, present and future. Concurrency and Computation: Practice and Experience , 2003, 15(9): 803-820 doi: 10.1002/cpe.728
|
8 |
http://hadoop.apache.org
|
9 |
Bienia C. Benchmarking modern multiprocessors. . Princeton University , 2011
|
10 |
http://www.spec.org/cpu2006
|
11 |
http://www.spec.org/web2005
|
12 |
http://www.tpc.org/information/benchmarks.asp
|
13 |
http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
|
14 |
Huang S, Huang J, Dai J, Xie T, Huang B. The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: Proceedings of the 26th IEEE International Conference on Data Engineering Workshops, ICDEW’10 . 2010, 41-51
|
15 |
Chen Y, Ganapathi A, Griffith R, Katz R. The case for evaluating mapreduce performance using workload suites. In: Proceedings of the IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS’11 . 2011, 390-399
|
16 |
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A, Ailamaki A, Falsafi B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems . 2012, 37-48
|
17 |
Zhan J, Zhang L, Sun N, Wang L, Jia Z, Luo C. High volume throughput computing: identifying and characterizing throughput oriented workloads in data centers. In: Proceedings of the 2012 Workshop on Large-Scale Parallel Processing . 2012
|
18 |
Xi H, Zhan J, Jia Z, Hong X, Wang L, Zhang L, Sun N, Lu G. Characterization of real workloads of web search engines. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC’11 . 2011, 15-25
|
19 |
http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html
|
20 |
Zaharia M, Borthakur D, Sarma J, Elmeleegy K, Shenker S, Stoica I. Job scheduling for multi-user mapreduce clusters. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55 , 2009
|
21 |
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
|
22 |
Rasooli A, Down D. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research . 2011, 30-44
|
23 |
Sandholm T, Lai K. Dynamic proportional share scheduling in Hadoop. In: Job Scheduling Strategies for Parallel Processing . 2010, 110-131
|
24 |
Wolf J, Rajan D, Hildrum K, Khandekar R, Kumar V, Parekh S,Wu K, Balmin A. Flex: a slot allocation scheduling optimizer for mapreduce workloads. Middleware 2010 , 2010, 1-20
|
25 |
Lee G, Chun B, Katz R. Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIXWorkshop on Hot Topics in Cloud Computing, HotCloud’11 . 2011
|
26 |
Yong M, Garegrat N, Mohan S. Towards a resource aware scheduler in hadoop. In: Proceedings of the 2009 IEEE International Conference on Web Services . 2009, 102-109
|
27 |
Wang L, Zhan J, Shi W, Yi L. In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems , 2012, 23(2): 296-303 doi: 10.1109/TPDS.2011.144
|
28 |
Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A. Minebench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization . 2006, 182-188
|
29 |
Patterson D, Hennessy J. Computer organization and design: the hardware/software interface. Morgan Kaufmann , 2009
|
30 |
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM , 2008, 51(1): 107-113 doi: 10.1145/1327452.1327492
|
31 |
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems , 2008, 14(1): 1-37 doi: 10.1007/s10115-007-0114-2
|
32 |
Linden G, Smith B, York J. Amazon.com recommendations: item-toitem collaborative filtering. IEEE Internet Computing , 2003, 7(1): 76-80 doi: 10.1109/MIC.2003.1167344
|
33 |
http://en.wikipedia.org/wiki/Association_rule_learning
|
34 |
https://issues.apache.org/jira/browse/HIVE-396
|
35 |
http://hive.apache.org/
|
36 |
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems . 2010, 265-278
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|