|
|
An online service-oriented performance profiling tool for cloud computing systems |
Haibo MI1( ), Huaimin WANG1, Yangfan ZHOU2, Michael Rung-Tsong LYU2, Hua CAI3, Gang YIN1 |
1. National Lab for Parallel & Distributed Processing, National University of Defense Technology, Changsha 410073, China; 2. Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen 518000, China; 3. Computing Platform, Alibaba Cloud Computing Company, Hangzhou 310000, China |
|
|
Abstract The growing scale and complexity of component interactions in cloud computing systems post great challenges for operators to understand the characteristics of system performance. Profiling has long been proved to be an effective approach to performance analysis; however, existing approaches confront new challenges that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, service-oriented profiling should be considered to support separation-of-concerns performance analysis. To address the above issues, in this paper, we present P-Tracer, an online performance profiling tool specifically tailored for cloud computing systems. P-Tracer constructs a specific search engine that proactively processes performance logs and generates a particular index for fast queries; second, for each service, P-Tracer retrieves a statistical insight of performance characteristics from multidimensions and provides operators with a suite of web-based interfaces to query the critical information. We evaluate PTracer in the aspects of tracing overheads, data preprocessing scalability and querying efficiency. Three real-world case studies that happened in Alibaba cloud computing platform demonstrate that P-Tracer can help operators understand software behaviors and localize the primary causes of performance anomalies effectively and efficiently.
|
Keywords
cloud computing
performance profiling
performance anomaly
visual analytics
|
Corresponding Author(s):
MI Haibo,Email:rainmhb@gmail.com
|
Issue Date: 01 June 2013
|
|
1 |
Ren G, Tune E, Moseley T, Shi Y, Rus S, Hundt R. Google-wide profiling: a continuous profiling infrastructure for data centers. IEEE Micro Magazine , 2010, 30(4): 65-79 doi: 10.1109/MM.2010.68
|
2 |
Graham S, Kessler P, McKusick M. Gprof: a call graph execution pro- filer. ACM SIGPLAN Notices , 2004, 39(4): 49-57 doi: 10.1145/989393.989401
|
3 |
Mohr B, Wylie B, Wolf F. Performance measurement and analysis tools for extremely scalable systems. Concurrency and Computation: Practice and Experience , 2010, 22(16): 2212-2229 doi: 10.1002/cpe.1585
|
4 |
Thereska E, Salmon B, Strunk J, Wachs M, Abd-El-Malek M, Lopez J, Ganger G. Stardust: tracking activity in a distributed storage system. ACM SIGMETRICS Performance Evaluation Review , 2006, 34(1): 3-14 doi: 10.1145/1140103.1140280
|
5 |
Cantrill B, Shapiro M, Leventhal A. Dynamic instrumentation of production systems. In: Proceedings of the 2004 USENIX Annual Technical Conference . 2004, 2-15
|
6 |
Traeger A, Deras I, Zadok E. DARC: dynamic analysis of root causes of latency distributions. ACM SIGMETRICS Performance Evaluation Review , 2008, 36(1): 277-288 doi: 10.1145/1384529.1375489
|
7 |
Huang X, Wang W, Zhang W, Wei J, Huang T. An adaptive performance modeling approach to performance profiling of multi-service web applications. In: Proceedings of the 35th IEEE Computer Software and Applications Conference . 2011, 4-13
|
8 |
Sigelman B, Barroso L, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report, Google , 2010
|
9 |
Park I, Buch R. Event tracing-improve debugging and performance tuning with etw. MSDN Magazine-Louisville . 2007, 81-92
|
10 |
Sang B, Zhan J, Lu G, Wang H, Xu D, Wang L, Zhang Z, Jia Z. Precise, scalable, and online request tracing for multitier services of black boxes. IEEE Transactions on Parallel and Distributed Systems , 2012, 23(6): 1159-1167 doi: 10.1109/TPDS.2011.257
|
11 |
Tak B, Tang C, Zhang C, Govindan S, Urgaonkar B, Chang R. Vpath: precise discovery of request processing paths from black-box observations of thread and network activities. In: Proceedings of the 2009 Conference on USENIX Annual Technical Conference . 2009, 19-32
|
12 |
Koskinen E, Jannotti J. Borderpatrol: isolating events for black-box tracing. ACM SIGOPS Operating Systems Review , 2008, 42(4): 191-203 doi: 10.1145/1357010.1352613
|
13 |
Reynolds P, Wiener J, Mogul J, Aguilera M, Vahdat A. WAP5: blackbox performance debugging for wide-area systems. In: Proceedings of the 15th International Conference onWorldWideWeb . 2006, 347-356
|
14 |
Aguilera M, Mogul J, Wiener J, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review , 2003, 37(5): 74-89 doi: 10.1145/1165389.945454
|
15 |
Mills D. Network time protocol (Version 3) specification, implementation and analysis. RFC Editor , 1992
|
16 |
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Communications of the ACM , 2008, 51(1): 107-113 doi: 10.1145/1327452.1327492
|
17 |
Abdi H. Coeffcient of variation. Sage Publications , 2010
|
18 |
Massie M, Chun B, Culler D. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing , 2004, 30(7): 817-840 doi: 10.1016/j.parco.2004.04.001
|
19 |
Fay M, Proschan M. Wilcoxon-mann-whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics Surveys , 2010
|
20 |
Malik H, Adams B, Hassan A. Pinpointing the subsystems responsible for the performance deviations in a load test. In: Proceedings of the 21st International Symposium on Software Reliability Engineering . 2010, 201-210
|
21 |
Bodik P, Goldszmidt M, Fox A, Woodard D, Andersen H. Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems . 2010, 111-124
|
22 |
Misailovic S, Sidiroglou S, Hoffmann H, Rinard M. Quality of service profiling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering . 2010, 25-34 doi: 10.1145/1806799.1806808
|
23 |
Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Opearting Systems Design and Implementation (OSDI) . 2004, 259-272
|
24 |
Chen M, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: Problem determination in large, dynamic internet services. In: Proceedings of the 32nd International Conference on Dependable Systems and Net- works . 2002, 595-604
|
25 |
Chen M, Accardi A, Kiciman E, Lloyd J, Patterson D, Fox A, Brewer E. Path-based faliure and evolution management. In: Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation . 2004, 23-36
|
26 |
Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems , 2008, 26(2): 1-26 doi: 10.1145/1365815.1365816
|
27 |
Sambasivan R, Zheng A, De Rosa M, Krevat E, Whitman S, Stroucken M, Wang W, Xu L, Ganger G. Diagnosing performance changes by comparing request flows. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation . 2011, 43-56
|
28 |
Reynolds P, Killian C, Wiener J, Mogul J, Shah M, Vahdat A. Pip: detecting the unexpected in distributed systems. In: Proceedings of the 3rd Symposium on Networked Systems Design and Implementation . 2006, 115-128
|
29 |
Thereska E, Ganger G. Ironmodel: robust performance models in the wild. ACM SIGMETRICS Performance Evaluation Review , 2008, 36(1): 253-264 doi: 10.1145/1384529.1375486
|
30 |
Mann G, Sandler M, Krushevskaja D, Guha S, Even-Dar E. Modeling the parallel execution of black-box services. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing . 2011, 20-24
|
31 |
Ostrowski K, Mann G, Sandler M. Diagnosing latency in multi-tier black-box services. In: Proceedings of the 5th Workshop on Large Scale Distributed Systems and Middleware . 2011
|
32 |
Mi H, Wang H, Zhou Y, Lyu M R, Cai H. P-tracer: service-oriented performance profiling in cloud computing systems. In: Proceedings of IEEE 36th Annual Computer Software and Applications Conference . 2012
|
33 |
Zhang Z, Zhan J, Li Y, Wang L, Meng D, Sang B. Precise request tracing and performance debugging for multi-tier services of black boxes. In: Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks . 2009, 337-346
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|