Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (1) : 191201    https://doi.org/10.1007/s11704-023-2701-0
Software
Performance issue monitoring, identification and diagnosis of SaaS software: a survey
Rui WANG1, Xiangbo TIAN2, Shi YING2()
1. College of Mining and Safety Engineering, Shandong University of Science and Technology, Qingdao 266590, China
2. School of Computer Science, Wuhan University, Wuhan 430072, China
 Download: PDF(8477 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.

Keywords SaaS software      performance log generation      performance issue identification      performance issue diagnosis     
Corresponding Author(s): Shi YING   
Just Accepted Date: 07 November 2023   Issue Date: 06 February 2024
 Cite this article:   
Rui WANG,Xiangbo TIAN,Shi YING. Performance issue monitoring, identification and diagnosis of SaaS software: a survey[J]. Front. Comput. Sci., 2025, 19(1): 191201.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2701-0
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I1/191201
Fig.1  The basic architecture of performance issue identification and diagnosis of SaaS software
Step Techniques Main functions
Performance log generation Log scraping and system monitoring Providing the raw material
Performance issue identification Metric-based method, log-based method and behavior-based method Judging performance issue
Performance issue diagnosis Log-based analysis and process monitoring based approach Classifying performance issue
Tab.1  The detailed information of the three steps
Notations Descriptions
St The state of the SLO at time t.
mi The ith metric value of Mt.
r The requirement of user.
Δt The time interval.
TrΔt The response time of the requirement r in time interval Δt.
tri The arrived time of ith response.
tsi The response time of ith response.
ARTΔt The average response time of SaaS software.
si The ith service of SaaS software.
SlowΔt The number of slow requirements in time interval Δt.
NormalΔt The number of normal requirements in time interval Δt.
SARatioΔt The ratio of slow requirements.
Δl The running time of service.
xt The metric vector of SaaS software at time t.
X The observation set of Xt.
χ The concrete instance of X.
lt The system performance state related to X at time t.
? The system performance state sequence related to X.
τ The time index set.
V(?) The overall potential function of ?.
Z,Z2 The normalizing constant.
N The neighborhoods.
wp The normalizing weight for the total violations of constraint in the neighborhood.
vit The output of the ith neuron at time t.
Wijt(?) The connection weight related to ? between ith and jth neurons at time t.
n The number of samples.
m The number of available features.
T The number of snapshots.
F The feature matrix.
Fi The feature matrix of the ith sample.
Tab.2  The notations used in this paper
Log type Attributes set Attribute description
Status log (LT,SID,ST) LT: specific time for logging
SID: ID of a recorded object
ST: moment state of the recorded resource
Event log (LT,SID,EID,ET,ED) LT: specific time for logging
SID: ID of a recorded object
EID: event ID
ET: event type
ED: description information of the event
Error log (LT,SID,LEVEL,ET, ED) LT: specific time for logging
SID: ID of a recorded object
LEVEL: error level
ET: specific type of error
ED: error description information
Tab.3  The data format of different type of performance logs
Fig.2  The categories of log scraping researches
Fig.3  The overall framework of fluentd
Fig.4  The overall framework of KafKa
Fig.5  The framework of performance log generation
Fig.6  The steps of our proposed performance issue identification method
  
  
  
  
Fig.7  The steps of our proposed performance issue diagnosis method
  
Layers Objects Metrics Descriptions
SaaS Service Response Time Response Time of service (ms)
PaaS MySQL Threads_connected No. of MySQL server connected threads (counts)
Threads_running No. of MySQL server running threads (counts)
Tomcat JVM_Free Size of free JVM (MB)
Tomcat_requestCount Total No. of Tomcat requests (counts)
Tomcat_Thread No. of Tomcat threads (counts)
IaaS CPU CPU_Cores No. of CPU Cores (counts)
CPU_utilization Percentage of CPU utilization (%)
Memory Mem_totalSize Size of Memory (GB)
Mem_usagePercent Percentage of Memory utilization (%)
Disk Disk_avail Size of free disk (GB)
IO_Read Rate of read operation (Kbps)
IO_Write Rate of write operation (Kbps)
Network NetCard_Receive Rate of Network card receive bytes (Kbps)
NetCard_Send Rate of Network card send bytes (Kbps)
Physical resources CPU CPU_Allocated Allocated CPU (GHz)
utilization Percentage of CPU utilization (%)
Memory Mem_Allocated Allocated Memory (GB)
utilization Percentage of Memory utilization (%)
Primary storage Storage_Allocated Allocated Primary Storage (TB)
utilization Percentage of storage utilization (%)
Tab.4  Performance metrics
Actual record
Relevant NonRelevant
Classified result NonClassified classified True positives (TP) Fault positives (FP)
Fault negatives (FN) True negatives (TN)
Tab.5  Evaluation metrics
Fig.8  The influence of performance issue identification for service performance
Dataset NBSVM KNC NCC LR HMRF
GaussianNBC MultinomialNBC BernoulliNBC
IDRAS1 0.86 0.69 0.77 0.90 0.87 0.88 0.89 0.89
IDRAS2 0.85 0.65 0.78 0.87 0.85 0.84 0.85 0.90
IDRAS3 0.86 0.66 0.78 0.89 0.84 0.85 0.86 0.91
IDRAS4 0.84 0.70 0.79 0.88 0.83 0.84 0.85 0.88
IDRAS5 0.83 0.67 0.77 0.87 0.85 0.86 0.87 0.87
AVG. 0.848 0.674 0.778 0.882 0.848 0.854 0.864 0.89
Tab.6  F1 of the HMRF-MAP and comparison models
Fig.9  The response time of using performance issue method and manual identification
Fig.10  The influence of our proposed performance issue diagnosis method for service performance
Models Metrics
Precision Recall F1
RBM with ICA 0.88 0.91 0.89
RBM without ICA 0.87 0.90 0.89
GNB 0.86 0.87 0.87
DT 0.84 0.86 0.85
Boosting 0.84 0.87 0.86
ME 0.83 0.86 0.86
Tab.7  F1 of the RBM with ICA and comparison models
Fig.11  The response time of using performance issue identification method, performance issue diagnosis method and manual identification
  
  
  
1 Z, Chen M, Kim Y Cui . SaaS application mashup based on high speed message processing. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16( 5): 1446–1465
2 León Guillén M Á D, De V, Morales-Rocha Martínez L F Fernández . A systematic review of security threats and countermeasures in SaaS. Journal of Computer Security, 2020, 28( 6): 635–653
3 D, Soni N Kumar . Machine learning techniques in emerging cloud computing integrated paradigms: a survey and taxonomy. Journal of Network and Computer Applications, 2022, 205: 103419
4 W, Li Y, Zhang Z, Guo L Liu . Study on SaaS cloud service development for telecom operators. Telecommunications Science, 2012, 28( 1): 132–136
5 J, Ju Y, Wang J, Fu J, Wu Z Lin . Research on key technology in SaaS. In: Proceedings of 2010 International Conference on Intelligent Computing and Cognitive Informatics. 2010, 384−387
6 R, O’Dywer S W Neville . Assessing QoS consistency in cloud-based software-as-a-service deployments. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2017, 1−6
7 He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566−573
8 S, Varshney R, Sandhu P K Gupta . QoS based resource provisioning in cloud computing environment: a technical survey. In: Proceedings of the 3rd International Conference on Advances in Computing and Data Sciences. 2019, 711−723
9 J, Park H Y Jeong . The QoS-based MCDM system for SaaS ERP applications with social network. The Journal of Supercomputing, 2013, 66( 2): 614–632
10 H, Luo M L Shyu . Quality of service provision in mobile multimedia-a survey. Human-centric Computing and Information Sciences, 2011, 1( 1): 5
11 D, Thain T, Tannenbaum M Livny . Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2−4): 323−356
12 F, Berman G, Fox A J G Hey . Grid Computing: Making the Global Infrastructure A Reality. New York: John Wiley & Sons, 2003
13 Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61−71
14 R, Wang S Ying . SaaS software performance issue identification using HMRF-MAP framework. Software: Practice and Experience, 2018, 48( 11): 2000–2018
15 M, Munshi T, Shrimali S Gaur . A review of enhancing online learning using graph-based data mining techniques. Soft Computing, 2022, 26( 12): 5539–5552
16 I, Batool T A Khan . Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Computers and Electrical Engineering, 2022, 100: 107886
17 D, El-Masri F, Petrillo Y G, Guéhéneuc A, Hamou-Lhadj A Bouziane . A systematic literature review on automated log abstraction techniques. Information and Software Technology, 2020, 122: 106276
18 Y, Zhong Y, Guo C Liu . FLP: a feature-based method for log parsing. Electronics Letters, 2018, 54( 23): 1334–1336
19 Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251−257
20 Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085−1092
21 L, Zeng Y, Xiao H, Chen B, Sun W Han . Computer operating system logging and security issues: a survey. Security and Communication Networks, 2016, 9( 17): 4804–4821
22 B, Chen Z M Jiang . A survey of software log instrumentation. ACM Computing Surveys, 2022, 54( 4): 90
23 Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497-509, 149−158
24 Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397−400
25 Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239−246
26 K, Yao Pádua G B, De W, Shang C, Sporea A, Toma S Sajedi . Log4perf: Suggesting and updating logging locations for web-based systems’ performance monitoring. Empirical Software Engineering, 2020, 25( 1): 488–531
27 Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534−539
28 S, He P, He Z, Chen T, Yang Y, Su M R Lyu . A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 2022, 54( 6): 130
29 H, Gujral S, Lal H Li . An exploratory semantic analysis of logging questions. Journal of Software: Evolution and Process, 2021, 33( 7): e2361
30 C Schwarz . Ldagibbs: A command for topic modeling in Stata using latent dirichlet allocation. The Stata Journal: Promoting Communications on Statistics and Stata, 2018, 18( 1): 101–117
31 J, Joung H M Kim . Automated keyword filtering in latent dirichlet allocation for identifying product attributes from online reviews. Journal of Mechanical Design, 2021, 143( 8): 084501
32 Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297−1300
33 J, Fu N, Liu C, Hu X Zhang . Hot topic classification of microblogging based on cascaded latent dirichlet allocation. ICIC Express Letters, Part B: Applications, 2016, 7( 3): 621–625
34 J, Wu G, Son S Wang . A competency mining method based on latent dirichlet allocation (LDA) model. Journal of Physics: Conference Series, 2020, 1682: 012059
35 Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015
36 V, Rus N, Niraula R Banjade . Similarity measures based on latent dirichlet allocation. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. 2013, 459−470
37 Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293−306
38 Q, Fu J, Zhu W, Hu J G, Lou R, Ding Q, Lin D, Zhang T Xie . Where do developers log? An empirical study on logging practices in industry. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 24−33
39 Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415−425
40 Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125−127
41 Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043−1047
42 M, Cinque D, Cotroneo A Pecchia . Event logs for the analysis of software failures: a rule-based approach. IEEE Transactions on Software Engineering, 2013, 39( 6): 806–821
43 S, Li X, Niu Z, Jia X, Liao J, Wang T Li . Guiding log revisions by learning from software evolution history. Empirical Software Engineering, 2020, 25( 3): 2302–2340
44 H, Zhang Y, Tang M, Lamothe H, Li W Shang . Studying logging practice in test code. Empirical Software Engineering, 2022, 27( 4): 83
45 P, Zadrozny R Kodali . Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Berkeley: Apress, 2013
46 H A, Patel A D Meniya . A survey on commercial and open source cloud monitoring. International Journal of Science and Modern Engineering (IJISME), 2013, 1( 2): 42–44
47 L George . HBase: The Definitive Guide: Random Access to Your Planet-Size Data. Sebastopol: O’Reilly Media, Inc., 2011
48 Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81−89
49 V, Bhupathiraju R P Ravuri . The dawn of big data-Hbase. In: Proceedings of 2014 Conference on IT in Business, Industry and Government (CSIBIG). 2014, 1−4
50 Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225−234
51 C, Zhang X Liu . HBaseMQ: a distributed message queuing system on clouds with HBase. In: Proceedings of 2013 Proceedings IEEE INFOCOM. 2013, 40−44
52 Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809−1812
53 Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215−224
54 C, Giblin S, Rooney P, Vetsch A Preston . Securing Kafka with encryption-at-rest. In: Proceedings of 2021 IEEE International Conference on Big Data (Big Data). 2021, 5378−5387
55 Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117−120
56 H Wu . Research proposal: reliability evaluation of the apache Kafka streaming system. In: Proceedings of 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 2019, 112−113
57 Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406−413
58 W, Tsai X, Bai Y Huang . Software-as-a-service (SaaS): perspectives and challenges. Science China Information Sciences, 2014, 57( 5): 1–15
59 Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73−81
60 A, Stephen S, Benedict R P A Kumar . Monitoring IaaS using various cloud monitors. Cluster Computing, 2019, 22( 5): 12459–12471
61 Silva Rocha É, Da Silva L G F, Da G L, Santos D, Bezerra A, Moreira G, Gonçalves M V, Marquezini A, Mehta M, Wildeman J, Kelner D, Sadok P T Endo . Aggregating data center measurements for availability analysis. Software: Practice and Experience, 2021, 51( 5): 868–892
62 Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32−42
63 Chaves S A, De R B, Uriarte C B Westphall . Toward an architecture for monitoring private clouds. IEEE Communications Magazine, 2011, 49( 12): 130–137
64 M L, Massie B N, Chun D E Culler . The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30( 7): 817–840
65 Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011
66 Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473−477
67 S, Andreozzi Bortoli N, De S, Fantinel A, Ghiselli G L, Rubini G, Tortone M C Vistoli . GridiCE: a monitoring service for grid systems. Future Generation Computer Systems, 2005, 21( 4): 559–571
68 B, König J M A, Calero J Kirschnick . Elastic monitoring framework for cloud infrastructures. IET Communications, 2012, 6( 10): 1306–1315
69 J, Povedano-Molina J M, Lopez-Vega J M, Lopez-Soler A, Corradi L Foschini . DARGOS: a highly adaptable and scalable monitoring architecture for multi-tenant clouds. Future Generation Computer Systems, 2013, 29( 8): 2041–2056
70 S, Meng L Liu . Enhanced monitoring-as-a-service for effective cloud management. IEEE Transactions on Computers, 2013, 62( 9): 1705–1720
71 J M A, Calero J G Aguado . MonPaaS: An adaptive monitoring platformas a service for cloud computing infrastructures and services. IEEE Transactions on Services Computing, 2015, 8( 1): 65–78
72 K, Alhamazani R, Ranjan P P, Jayaraman K, Mitra C, Liu F, Rabhi D, Georgakopoulos L Wang . Cross-layer multi-cloud real-time application QoS monitoring and benchmarking as-a-service framework. IEEE Transactions on Cloud Computing, 2019, 7( 1): 48–61
73 Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232−236
74 A, Badshah A, Jalal U, Farooq G U, Rehman S S, Band C Iwendi . Service level agreement monitoring as a service: an independent monitoring service for service level agreements in clouds. Big Data, 2023, 11( 5): 339–354
75 H, Mezni M, Sellami S, Aridhi F B Charrada . Towards big services: a synergy between service computing and parallel programming. Computing, 2021, 103( 11): 2479–2519
76 H Mezni . Web service adaptation: a decade’s overview. Computer Science Review, 2023, 48: 100535
77 R, Kumar K, Jain H, Maharwal N, Jain A Dadhich . Apache CloudStack: open source infrastructure as a service cloud computing platform. International Journal of Advancement in Engineering Technology, Management & Applied Science, 2014, 1( 2): 111–116
78 B, Schwartz P, Zaitsev V Tkachenko . High Performance MySQL: Optimization, Backups, and Replication. Sebastopol: O’Reilly Media, Inc., 2012
79 W, Sun X, Zhang C J, Guo P, Sun H Su . Software as a service: configuration and customization perspectives. In: Proceedings of 2008 IEEE Congress on Services Part II (Services-2 2008). 2008, 18−25
80 Z, Lan Z, Zheng Y Li . Toward automated anomaly identification in large-scale systems. IEEE Transactions on Parallel and Distributed Systems, 2010, 21( 2): 174–187
81 L, Yu Z Lan . A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Transactions on Parallel and Distributed Systems, 2016, 27( 7): 1902–1914
82 Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyber-physical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255−268
83 R, Wang S Ying . SaaS software performance issue diagnosis using independent component analysis and restricted boltzmann machine. Concurrency and Computation: Practice and Experience, 2020, 32( 14): e5729
84 Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1−6
85 A Chaturvedi . Method and system for near real time reduction of insignificant key performance indicator data in a heterogeneous radio access and core network. In: Proceedings of 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). 2020, 1−7
86 E, Kusrini K N, Safitri A Fole . Design key performance indicator for distribution sustainable supply chain management. In: Proceedings of 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020, 738−744
87 A, Hinderks M, Schrepp F J D, Mayo M J, Escalona J Thomaschewski . Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces, 2019, 65: 38–44
88 Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194−211
89 Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103−114
90 Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657−663
91 S, Shu Z, Lin Y, Yan L Li . Learning from multi-class positive and unlabeled data. In: Proceedings of 2020 IEEE International Conference on Data Mining (ICDM). 2020, 1256−1261
92 X, Chen W, Chen T, Chen Y, Yuan C, Gong K, Chen Z Wang . Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 141
93 Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607−618
94 Hu W, Le R, Liu B, , Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806-7814
95 J, Qiu X, Cai X, Zhang F, Cheng S, Yuan G Fu . An evolutionary multi-objective approach to learn from positive and unlabeled data. Applied Soft Computing, 2021, 101: 106986
96 C, Gong T, Liu J, Yang D Tao . Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 11): 3471–3483
97 P, He J, Zhu Z, Zheng M R Lyu . Drain: an online log parsing approach with fixed depth tree. In: Proceedings of 2017 IEEE International Conference on Web Services (ICWS). 2017, 33−40
98 A, Plaat J, Schaeffer W, Pijls Bruin A De . Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996, 87( 1-2): 255–293
99 M, Du F Li . Spell: online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 11): 2213–2227
100 Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187−201
101 Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64−72
102 Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1−9
103 A, Vervaet R, Chiky M Callau-Zori . USTEP: unfixed search tree for efficient log parsing. In: Proceedings of 2021 IEEE International Conference on Data Mining (ICDM). 2021, 659−668
104 Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116−125
105 Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80−85
106 Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209−214
107 H, Dai H, Li C S, Chen W, Shang T H Chen . Logram: efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering, 2022, 48( 3): 879–892
108 Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149−158
109 Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117−132
110 Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179−186
111 A, Geiger D, Liu S, Alnegheimish A, Cuesta-Infante K Veeramachaneni . TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings of 2020 IEEE International Conference on Big Data (Big Data). 2020, 33−43
112 W, Luo P, Wang J, Wang W An . The research process of generative adversarial networks. Journal of Physics: Conference Series, 2019, 1176( 3): 032008
113 N T, Tran V H, Tran N B, Nguyen T K, Nguyen N M Cheung . On data augmentation for GAN training. IEEE Transactions on Image Processing, 2021, 30: 1882–1897
114 Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691−702
115 R, Sinha A, Sankaran M, Vatsa R Singh . AuthorGAN: Improving GAN reproducibility using a modular GAN framework. 2019, arXiv preprint arXiv: 1911.13250
116 W, Xia Y, Zhang Y, Yang J H, Xue B, Zhou M H Yang . GAN inversion: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 3): 3121–3138
117 X, Wang Q, Cao Q, Wang Z, Cao X, Zhang P Wang . Robust log anomaly detection based on contrastive learning and multi-scale mass. The Journal of Supercomputing, 2022, 78( 16): 17491–17512
118 Z, Zhang S, Wu D, Jiang G Chen . BERT-JAM: boosting BERT-enhanced neural machine translation with joint attention. 2020, arXiv preprint arXiv: 2011.04266
119 Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201−206
120 L, Shi D, Liu G, Liu K Meng . AUG-BERT: an efficient data augmentation algorithm for text classification. In: Proceedings of the 8th International Conference in Communications, Signal Processing, and Systems. 2020, 2191−2198
121 Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177−183
122 Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448−1460
123 A, Farzad T A Gulliver . Log message anomaly detection with fuzzy c-means and MLP. Applied Intelligence, 2022, 52( 15): 17708–17717
124 C, Zhang X, Peng C, Sha K, Zhang Z, Fu X, Wu Q, Lin D Zhang . DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning. In: Proceedings of the 44th International Conference on Software Engineering. 2022, 623−634
125 C, Zhang X, Peng T, Zhou C, Sha Z, Yan Y, Chen H Yang . TraceCRL: contrastive representation learning for microservice trace analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022, 1221−1232
126 M K, Aguilera J C, Mogul J L, Wiener P, Reynolds A Muthitacharoen . Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37( 5): 74–89
127 Y Y M Chen . Path-Based Failure and Evolution Management. Berkeley: University of California at Berkeley, 2004
128 Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18
129 Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750−755
130 M H, Lim J G, Lou H, Zhang Q, Fu A B J, Teoh Q, Lin R, Ding D Zhang . Identifying recurrent and unknown performance issues. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 320−329
131 A, Fischer C Igel . Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47( 1): 25–39
132 Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33−40
133 Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48−58
134 I, Kohyarnejadfard D, Aloise M R, Dagenais M Shakeri . A framework for detecting system performance anomalies using tracing data analysis. Entropy, 2021, 23( 8): 1011
135 Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512−519
136 Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38−44
137 J, Liu Y, Hu B, Wu Y, Wang F Xie . A hybrid generalized hidden markov model-based condition monitoring approach for rolling bearings. Sensors, 2017, 17( 5): 1143
138 R, Wang S, Ying C, Sun H, Wan H, Zhang X Jia . Model construction and data management of running log in supporting saas software performance analysis. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (SEKE 2017). 2017, 149−154
139 Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71−80
140 Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446−458
141 X, Guo X, Peng H, Wang W, Li H, Jiang D, Ding T, Xie L Su . Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1387−1397
142 Y, Huo J, Dong Z, Ge P, Xie N, An Y Yang . IWApriori: an association rule mining and self-updating method based on weighted increment. In: Proceedings of the 21st Asia-Pacific Network Operations and Management Symposium (APNOMS). 2020, 167−172
143 L, Wang N, Zhao J, Chen P, Li W, Zhang K Sui . Root-cause metric location for microservice systems via log anomaly detection. In: Proceedings of 2020 IEEE International Conference on Web Services (ICWS). 2020, 142−150
144 Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338−347
145 Y, Gan M, Liang S, Dev D, Lo C Delimitrou . Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 135−151
146 M, Ma W, Lin D, Pan P Wang . ServiceRank: root cause identification of anomaly in large-scale microservice architectures. IEEE Transactions on Dependable and Secure Computing, 2022, 19( 5): 3087–3100
147 M, Li Z, Li K, Yin X, Nie W, Zhang K, Sui D Pei . Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 3230−3240
148 Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request flow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629−644
149 K, Bare S P, Kavulya J, Tan X, Pan E, Marinelli M, Kasick R, Gandhi P Narasimhan . ASDF: an automated, online framework for diagnosing performance problems. In: Casimiro A, Lemos R, Gacek C, eds. Architecting Dependable Systems VII. Berlin: Springer, 2010, 201−226
150 Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307−320
151 Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012−1021
152 O, Tuncer E, Ates Y, Zhang A, Turk J, Brandt V J, Leung M, Egele A K Coskun . Online diagnosis of performance variation in hpc systems using machine learning. IEEE Transactions on Parallel and Distributed Systems, 2019, 30( 4): 883–896
153 M, Li D, Tang Z, Wen Y Cheng . Universal anomaly detection method based on massive monitoring indicators of cloud platform. In: Proceedings of 2021 IEEE International Conference on Software Engineering and Artifcial Intelligence (SEAI). 2021, 23−29
154 A, Borghesi M, Molan M, Milano A Bartolini . Anomaly detection and anticipation in high performance computing systems. IEEE Transactions on Parallel and Distributed Systems, 2022, 33( 4): 739–750
155 S V Stehman . Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, 62( 1): 77–89
156 D M W Powers . Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2020, arXiv preprint arXiv: 2010.16061
157 W W Hsieh . Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels. Cambridge: Cambridge University Press, 2009
158 J, Qin Z S He . A SVM face recognition method based on Gabor-featured key points. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005, 5144−5149
159 M A, Hearst S T, Dumais E, Osuna J, Platt B Scholkopf . Support vector machines. IEEE Intelligent Systems and Their Applications, 1998, 13( 4): 18–28
160 Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46
161 Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164
162 Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317
163 Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009
164 P, Lam L, Wang H Y, Ngan N H, Yung A G Yeh . Outlier detection in large-scale traffic data by naïve Bayes method and Gaussian mixture model method. 2015, arXiv preprint arXiv: 1512.08413
165 W Y Loh . Classification and regression trees. WIREs: Data Mining and Knowledge Discovery, 2011, 1( 1): 14–23
166 Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23−37
167 S J, Phillips R P, Anderson R E Schapire . Maximum entropy modeling of species geographic distributions. Ecological Modelling, 2006, 190( 3-4): 231–259
[1] FCS-22701-OF-RW_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed