Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    2013, Vol. 7 Issue (3) : 431-445    https://doi.org/10.1007/s11704-013-2193-4
RESEARCH ARTICLE
An online service-oriented performance profiling tool for cloud computing systems
Haibo MI1(), Huaimin WANG1, Yangfan ZHOU2, Michael Rung-Tsong LYU2, Hua CAI3, Gang YIN1
1. National Lab for Parallel & Distributed Processing, National University of Defense Technology, Changsha 410073, China; 2. Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen 518000, China; 3. Computing Platform, Alibaba Cloud Computing Company, Hangzhou 310000, China
 Download: PDF(912 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The growing scale and complexity of component interactions in cloud computing systems post great challenges for operators to understand the characteristics of system performance. Profiling has long been proved to be an effective approach to performance analysis; however, existing approaches confront new challenges that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, service-oriented profiling should be considered to support separation-of-concerns performance analysis. To address the above issues, in this paper, we present P-Tracer, an online performance profiling tool specifically tailored for cloud computing systems. P-Tracer constructs a specific search engine that proactively processes performance logs and generates a particular index for fast queries; second, for each service, P-Tracer retrieves a statistical insight of performance characteristics from multidimensions and provides operators with a suite of web-based interfaces to query the critical information. We evaluate PTracer in the aspects of tracing overheads, data preprocessing scalability and querying efficiency. Three real-world case studies that happened in Alibaba cloud computing platform demonstrate that P-Tracer can help operators understand software behaviors and localize the primary causes of performance anomalies effectively and efficiently.

Keywords cloud computing      performance profiling      performance anomaly      visual analytics     
Corresponding Author(s): MI Haibo,Email:rainmhb@gmail.com   
Issue Date: 01 June 2013
 Cite this article:   
Haibo MI,Huaimin WANG,Yangfan ZHOU, et al. An online service-oriented performance profiling tool for cloud computing systems[J]. Front Comput Sci, 2013, 7(3): 431-445.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-013-2193-4
https://academic.hep.com.cn/fcs/EN/Y2013/V7/I3/431
1 Ren G, Tune E, Moseley T, Shi Y, Rus S, Hundt R. Google-wide profiling: a continuous profiling infrastructure for data centers. IEEE Micro Magazine , 2010, 30(4): 65-79
doi: 10.1109/MM.2010.68
2 Graham S, Kessler P, McKusick M. Gprof: a call graph execution pro- filer. ACM SIGPLAN Notices , 2004, 39(4): 49-57
doi: 10.1145/989393.989401
3 Mohr B, Wylie B, Wolf F. Performance measurement and analysis tools for extremely scalable systems. Concurrency and Computation: Practice and Experience , 2010, 22(16): 2212-2229
doi: 10.1002/cpe.1585
4 Thereska E, Salmon B, Strunk J, Wachs M, Abd-El-Malek M, Lopez J, Ganger G. Stardust: tracking activity in a distributed storage system. ACM SIGMETRICS Performance Evaluation Review , 2006, 34(1): 3-14
doi: 10.1145/1140103.1140280
5 Cantrill B, Shapiro M, Leventhal A. Dynamic instrumentation of production systems. In: Proceedings of the 2004 USENIX Annual Technical Conference . 2004, 2-15
6 Traeger A, Deras I, Zadok E. DARC: dynamic analysis of root causes of latency distributions. ACM SIGMETRICS Performance Evaluation Review , 2008, 36(1): 277-288
doi: 10.1145/1384529.1375489
7 Huang X, Wang W, Zhang W, Wei J, Huang T. An adaptive performance modeling approach to performance profiling of multi-service web applications. In: Proceedings of the 35th IEEE Computer Software and Applications Conference . 2011, 4-13
8 Sigelman B, Barroso L, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report, Google , 2010
9 Park I, Buch R. Event tracing-improve debugging and performance tuning with etw. MSDN Magazine-Louisville . 2007, 81-92
10 Sang B, Zhan J, Lu G, Wang H, Xu D, Wang L, Zhang Z, Jia Z. Precise, scalable, and online request tracing for multitier services of black boxes. IEEE Transactions on Parallel and Distributed Systems , 2012, 23(6): 1159-1167
doi: 10.1109/TPDS.2011.257
11 Tak B, Tang C, Zhang C, Govindan S, Urgaonkar B, Chang R. Vpath: precise discovery of request processing paths from black-box observations of thread and network activities. In: Proceedings of the 2009 Conference on USENIX Annual Technical Conference . 2009, 19-32
12 Koskinen E, Jannotti J. Borderpatrol: isolating events for black-box tracing. ACM SIGOPS Operating Systems Review , 2008, 42(4): 191-203
doi: 10.1145/1357010.1352613
13 Reynolds P, Wiener J, Mogul J, Aguilera M, Vahdat A. WAP5: blackbox performance debugging for wide-area systems. In: Proceedings of the 15th International Conference onWorldWideWeb . 2006, 347-356
14 Aguilera M, Mogul J, Wiener J, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review , 2003, 37(5): 74-89
doi: 10.1145/1165389.945454
15 Mills D. Network time protocol (Version 3) specification, implementation and analysis. RFC Editor , 1992
16 Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Communications of the ACM , 2008, 51(1): 107-113
doi: 10.1145/1327452.1327492
17 Abdi H. Coeffcient of variation. Sage Publications , 2010
18 Massie M, Chun B, Culler D. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing , 2004, 30(7): 817-840
doi: 10.1016/j.parco.2004.04.001
19 Fay M, Proschan M. Wilcoxon-mann-whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics Surveys , 2010
20 Malik H, Adams B, Hassan A. Pinpointing the subsystems responsible for the performance deviations in a load test. In: Proceedings of the 21st International Symposium on Software Reliability Engineering . 2010, 201-210
21 Bodik P, Goldszmidt M, Fox A, Woodard D, Andersen H. Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems . 2010, 111-124
22 Misailovic S, Sidiroglou S, Hoffmann H, Rinard M. Quality of service profiling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering . 2010, 25-34
doi: 10.1145/1806799.1806808
23 Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Opearting Systems Design and Implementation (OSDI) . 2004, 259-272
24 Chen M, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: Problem determination in large, dynamic internet services. In: Proceedings of the 32nd International Conference on Dependable Systems and Net- works . 2002, 595-604
25 Chen M, Accardi A, Kiciman E, Lloyd J, Patterson D, Fox A, Brewer E. Path-based faliure and evolution management. In: Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation . 2004, 23-36
26 Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems , 2008, 26(2): 1-26
doi: 10.1145/1365815.1365816
27 Sambasivan R, Zheng A, De Rosa M, Krevat E, Whitman S, Stroucken M, Wang W, Xu L, Ganger G. Diagnosing performance changes by comparing request flows. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation . 2011, 43-56
28 Reynolds P, Killian C, Wiener J, Mogul J, Shah M, Vahdat A. Pip: detecting the unexpected in distributed systems. In: Proceedings of the 3rd Symposium on Networked Systems Design and Implementation . 2006, 115-128
29 Thereska E, Ganger G. Ironmodel: robust performance models in the wild. ACM SIGMETRICS Performance Evaluation Review , 2008, 36(1): 253-264
doi: 10.1145/1384529.1375486
30 Mann G, Sandler M, Krushevskaja D, Guha S, Even-Dar E. Modeling the parallel execution of black-box services. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing . 2011, 20-24
31 Ostrowski K, Mann G, Sandler M. Diagnosing latency in multi-tier black-box services. In: Proceedings of the 5th Workshop on Large Scale Distributed Systems and Middleware . 2011
32 Mi H, Wang H, Zhou Y, Lyu M R, Cai H. P-tracer: service-oriented performance profiling in cloud computing systems. In: Proceedings of IEEE 36th Annual Computer Software and Applications Conference . 2012
33 Zhang Z, Zhan J, Li Y, Wang L, Meng D, Sang B. Precise request tracing and performance debugging for multi-tier services of black boxes. In: Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks . 2009, 337-346
[1] Wei ZHENG, Ying WU, Xiaoxue WU, Chen FENG, Yulei SUI, Xiapu LUO, Yajin ZHOU. A survey of Intel SGX and its applications[J]. Front. Comput. Sci., 2021, 15(3): 153808-.
[2] Najme MANSOURI, Mohammad Masoud JAVIDI, Behnam Mohammad Hasani ZADE. Hierarchical data replication strategy to improve performance in cloud computing[J]. Front. Comput. Sci., 2021, 15(2): 152501-.
[3] Jiayang LIU, Jingguo BI, Mu LI. Secure outsourcing of large matrix determinant computation[J]. Front. Comput. Sci., 2020, 14(6): 146807-.
[4] Meysam VAKILI, Neda JAHANGIRI, Mohsen SHARIFI. Cloud service selection using cloud service brokers: approaches and challenges[J]. Front. Comput. Sci., 2019, 13(3): 599-617.
[5] Qiang LIU, Xiaoshe DONG, Heng CHEN, Yinfeng WANG. IncPregel: an incremental graph parallel computation model[J]. Front. Comput. Sci., 2018, 12(6): 1076-1089.
[6] Fei TIAN, Tao QIN, Tie-Yan LIU. Computational pricing in Internet era[J]. Front. Comput. Sci., 2018, 12(1): 40-54.
[7] Xiong FU, Juzhou CHEN, Song DENG, Junchang WANG, Lin ZHANG. Layered virtual machine migration algorithm for network resource balancing in cloud computing[J]. Front. Comput. Sci., 2018, 12(1): 75-85.
[8] Junhua LU,Wei CHEN,Yuxin MA,Junming KE,Zongzhuang LI,Fan ZHANG,Ross MACIEJEWSKI. Recent progress and trends in predictive visual analytics[J]. Front. Comput. Sci., 2017, 11(2): 192-207.
[9] Najme MANSOURI. Adaptive data replication strategy in cloud computing for performance improvement[J]. Front. Comput. Sci., 2016, 10(5): 925-935.
[10] Haibao CHEN,Song WU,Hai JIN,Wenguang CHEN,Jidong ZHAI,Yingwei LUO,Xiaolin WANG. A survey of cloud resource management for complex engineering applications[J]. Front. Comput. Sci., 2016, 10(3): 447-461.
[11] Zhaoning ZHANG,Dongsheng LI,Kui WU. Large-scale virtual machines provisioning in clouds:challenges and approaches[J]. Front. Comput. Sci., 2016, 10(1): 2-18.
[12] Bing YU,Yanni HAN,Hanning YUAN,Xu ZHOU,Zhen XU. A cost-effective scheme supporting adaptive service migration in cloud data center[J]. Front. Comput. Sci., 2015, 9(6): 875-886.
[13] Xiong FU,Chen ZHOU. Virtual machine selection and placement for dynamic consolidation in Cloud computing environment[J]. Front. Comput. Sci., 2015, 9(2): 322-330.
[14] Solomon Guadie WORKU,Chunxiang XU,Jining ZHAO. Cloud data auditing with designated verifier[J]. Front. Comput. Sci., 2014, 8(3): 503-512.
[15] Heng WU, Wenbo ZHANG, Jianhua ZHANG, Jun WEI, Tao HUANG. A benefit-aware on-demand provisioning approach for multi-tier applications in cloud computing[J]. Front Comput Sci, 2013, 7(4): 459-474.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed