|
|
Human-machine interactive streaming anomaly detection by online self-adaptive forest |
Qingyang LI, Zhiwen YU(), Huang XU, Bin GUO |
School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China |
|
|
Abstract Anomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to dynamically adjust anomaly score calculation. In real scenarios, anomaly detection often needs to be regulated by human feedback, which benefits adjusting anomaly detectors. In this paper, we propose a human-machine interactive streaming anomaly detection method, named ISPForest, which can be adaptively updated online under the guidance of human feedback. In particular, the feedback will be used to adjust the anomaly score calculation and structure of the detector, ideally attaining more accurate anomaly scores in the future. Our main contribution is to improve the tree-based streaming anomaly detection model that can be updated online from perspectives of anomaly score calculation and model structure. Our approach is instantiated for the powerful class of tree-based streaming anomaly detectors, and we conduct experiments on a range of benchmark datasets. The results demonstrate that the utility of incorporating feedback can improve the performance of anomaly detectors with a few human efforts.
|
Keywords
anomaly detection
human-machine interaction
human feedback
random space tree
ensemble method
|
Corresponding Author(s):
Zhiwen YU
|
Just Accepted Date: 18 February 2022
Issue Date: 02 August 2022
|
|
1 |
D M Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
|
2 |
C C Aggarwal. Outlier analysis. In: Aggarwal C C, ed. Data Mining. Cham: Springer, 2015, 237– 263
|
3 |
U, Fiore Santis A, De F, Perla P, Zanetti F Palmieri . Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 2019, 479: 448– 455
|
4 |
V S, Tseng J C, Ying C W, Huang Y, Kao K T Chen. FrauDetector: a graph-mining-based framework for fraudulent phone call detection. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 2157– 2166
|
5 |
F T, Liu K M, Ting Z H Zhou. Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 413– 422
|
6 |
X, Yang L J, Latecki D Pokrajac. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of 2009 SIAM International Conference on Data Mining. 2009, 145– 154
|
7 |
B, Zong Q, Song M R, Min W, Cheng C, Lumezanu D K, Cho H F Chen. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 6th International Conference on Learning Representations. 2018
|
8 |
E, Manzoor S M, Milajerdi L Akoglu. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1035– 1044
|
9 |
H, Paulheim R Meusel . A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 2015, 100( 2): 509– 531
|
10 |
D, Overby J, Wall J Keyser. Interactive analysis of situational awareness metrics. In: Proceedings of SPIE 8294 Visualization and Data Analysis 2012. 2012, 829406
|
11 |
N, Cao C, Shi S, Lin J, Lu Y R, Lin C Y Lin . TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Transactions on Visualization and Computer Graphics, 2016, 22( 1): 280– 289
|
12 |
S C, Tan K M, Ting T F Liu. Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1511– 1516
|
13 |
K, Wu K, Zhang W, Fan A, Edwards P S Yu. RS-Forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 600– 609
|
14 |
T Pevný . Loda: lightweight on-line detector of anomalies. Machine Learning, 2016, 102( 2): 275– 304
|
15 |
S M, Erfani S, Rajasegarar S, Karunasekera C Leckie . High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 2016, 58: 121– 134
|
16 |
K, Zhang M, Hutter H Jin. A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 813– 822
|
17 |
S, Guha N, Mishra G, Roy O Schrijvers. Robust random cut forest based anomaly detection on streams. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2712– 2721
|
18 |
X, Mu K M, Ting Z H Zhou . Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 8): 1605– 1618
|
19 |
H M, Gomes A, Bifet J, Read J P, Barddal F, Enembreck B, Pfharinger G, Holmes T Abdessalem . Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106( 9−10): 1469– 1495
|
20 |
S, Ahmad A, Lavin S, Purdy Z Agha . Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 2017, 262: 134– 147
|
21 |
P, Malhotra L, Vig G, Shroff P Agarwal. Long short term memory networks for anomaly detection in time series. In: Proceedings of the 23rd European Symposium on Artificial Neural Networks. 2015, 89– 94
|
22 |
J, Qiu Q, Du C Qian . KPI-TSAD: a time-series anomaly detector for KPI monitoring in cloud applications. Symmetry, 2019, 11( 11): 1350
|
23 |
M, Munir S A, Siddiqui A, Dengel S Ahmed . DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access, 2018, 7: 1991– 2005
|
24 |
Y, Dong N Japkowicz . Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 2018, 34( 1): 261– 281
|
25 |
K, Veeramachaneni I, Arnaldo V, Korrapati C, Bassias K Li. AI2: training a big data machine to defend . In: Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). 2016, 49– 54
|
26 |
S, Das W K, Wong A, Fern T G, Dietterich M A Siddiqui. Incorporating feedback into tree-based anomaly detection. 2017, arXiv preprint arXiv: 1708.09441
|
27 |
S, Das W K, Wong T, Dietterich A, Fern A Emmott. Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 853– 858
|
28 |
K M, Ting G T, Zhou F T, Liu J S C Tan. Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 989– 998
|
29 |
B P Welford . Note on a method for calculating corrected sums of squares and products. Technometrics, 1962, 4( 3): 419– 420
|
30 |
S, Bhatia A, Jain P, Li R, Kumar B Hooi. MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the Web Conference 2021. 2021, 3371– 3382
|
31 |
D J, Hand R J Till . A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45( 2): 171– 186
|
32 |
B, Schölkopf R C, Williamson A J, Smola J, Shawe-Taylor J C Platt. Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 582– 588
|
33 |
M M, Breunig H P, Kriegel R T, Ng J Sander. LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 93– 104
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|