Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    0, Vol. Issue () : 347-362    https://doi.org/10.1007/s11704-012-2118-7
RESEARCH ARTICLE
CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications
Chunjie LUO1, Jianfeng ZHAN1(), Zhen JIA1, Lei WANG1, Gang LU1, Lixin ZHANG1, Cheng-Zhong XU2,3, Ninghui SUN1
1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100019, China; 2. Department of Electrical and Computer Engineering,Wayne State University, Detroit, MI 48202, USA; 3. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
 Download: PDF(637 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing systems that are shared for running big data applications.We analyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, diversity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experiments, we demonstrate the advantages of our proposed metrics. In several case studies, we evaluate two small-scale deployments of cloud computing systems using CloudRank-D.

Keywords data center systems      clouds      big data applications      benchmarks      evaluation metrics     
Corresponding Author(s): ZHAN Jianfeng,Email:zhanjianfeng@ict.ac.cn   
Issue Date: 01 August 2012
 Cite this article:   
Chunjie LUO,Jianfeng ZHAN,Zhen JIA, et al. CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications[J]. Front Comput Sci, 0, (): 347-362.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-012-2118-7
https://academic.hep.com.cn/fcs/EN/Y0/V/I/347
1 Armbrust M, Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. Above the clouds: a Berkeley view of cloud computing. Deptartment Electrical Engineering and Compututer Sciences, University of California, Berkeley, Report UCB/EECS, 2009, 28
2 Barroso L, H?lzle U. The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture , 2009, 4(1): 1-108
doi: 10.2200/S00193ED1V01Y200905CAC006
3 http://wiki.apache.org/hadoop/PoweredBy
4 Wang P, Meng D, Han J, Zhan J, Tu B, Shi X, Wan L. Transformer: a new paradigm for building data-parallel programming models. IEEE Micro , 2010, 30(4): 55-64
doi: 10.1109/MM.2010.75
5 Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59-72
doi: 10.1145/1272998.1273005
6 Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sen Sarma J, Murthy R, Liu H. Data warehousing and analytics infrastructure at Facebook. In: Proceedings of the 2010 International Conference on Management of Data . 2010, 1013-1020
7 Dongarra J, Luszczek P, Petitet A. The linpack benchmark: past, present and future. Concurrency and Computation: Practice and Experience , 2003, 15(9): 803-820
doi: 10.1002/cpe.728
8 http://hadoop.apache.org
9 Bienia C. Benchmarking modern multiprocessors. . Princeton University , 2011
10 http://www.spec.org/cpu2006
11 http://www.spec.org/web2005
12 http://www.tpc.org/information/benchmarks.asp
13 http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
14 Huang S, Huang J, Dai J, Xie T, Huang B. The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: Proceedings of the 26th IEEE International Conference on Data Engineering Workshops, ICDEW’10 . 2010, 41-51
15 Chen Y, Ganapathi A, Griffith R, Katz R. The case for evaluating mapreduce performance using workload suites. In: Proceedings of the IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS’11 . 2011, 390-399
16 Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A, Ailamaki A, Falsafi B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems . 2012, 37-48
17 Zhan J, Zhang L, Sun N, Wang L, Jia Z, Luo C. High volume throughput computing: identifying and characterizing throughput oriented workloads in data centers. In: Proceedings of the 2012 Workshop on Large-Scale Parallel Processing . 2012
18 Xi H, Zhan J, Jia Z, Hong X, Wang L, Zhang L, Sun N, Lu G. Characterization of real workloads of web search engines. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC’11 . 2011, 15-25
19 http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html
20 Zaharia M, Borthakur D, Sarma J, Elmeleegy K, Shenker S, Stoica I. Job scheduling for multi-user mapreduce clusters. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55 , 2009
21 http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
22 Rasooli A, Down D. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research . 2011, 30-44
23 Sandholm T, Lai K. Dynamic proportional share scheduling in Hadoop. In: Job Scheduling Strategies for Parallel Processing . 2010, 110-131
24 Wolf J, Rajan D, Hildrum K, Khandekar R, Kumar V, Parekh S,Wu K, Balmin A. Flex: a slot allocation scheduling optimizer for mapreduce workloads. Middleware 2010 , 2010, 1-20
25 Lee G, Chun B, Katz R. Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIXWorkshop on Hot Topics in Cloud Computing, HotCloud’11 . 2011
26 Yong M, Garegrat N, Mohan S. Towards a resource aware scheduler in hadoop. In: Proceedings of the 2009 IEEE International Conference on Web Services . 2009, 102-109
27 Wang L, Zhan J, Shi W, Yi L. In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems , 2012, 23(2): 296-303
doi: 10.1109/TPDS.2011.144
28 Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A. Minebench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization . 2006, 182-188
29 Patterson D, Hennessy J. Computer organization and design: the hardware/software interface. Morgan Kaufmann , 2009
30 Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM , 2008, 51(1): 107-113
doi: 10.1145/1327452.1327492
31 Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems , 2008, 14(1): 1-37
doi: 10.1007/s10115-007-0114-2
32 Linden G, Smith B, York J. Amazon.com recommendations: item-toitem collaborative filtering. IEEE Internet Computing , 2003, 7(1): 76-80
doi: 10.1109/MIC.2003.1167344
33 http://en.wikipedia.org/wiki/Association_rule_learning
34 https://issues.apache.org/jira/browse/HIVE-396
35 http://hive.apache.org/
36 Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems . 2010, 265-278
[1] Huiqun WANG, Di HUANG, Yunhong WANG. GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding[J]. Front. Comput. Sci., 2022, 16(1): 161301-.
[2] Tingting CHEN, Haikun LIU, Xiaofei LIAO, Hai JIN. Resource abstraction and data placement for distributed hybrid memory pool[J]. Front. Comput. Sci., 2021, 15(3): 153103-.
[3] Shuai ZHANG, Xinjun MAO, Fu HOU, Peini LIU. A field-based service management and discovery method in multiple clouds context[J]. Front. Comput. Sci., 2019, 13(5): 976-995.
[4] Najme MANSOURI. Adaptive data replication strategy in cloud computing for performance improvement[J]. Front. Comput. Sci., 2016, 10(5): 925-935.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed