Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (2) : 172317    https://doi.org/10.1007/s11704-022-1270-y
RESEARCH ARTICLE
Human-machine interactive streaming anomaly detection by online self-adaptive forest
Qingyang LI, Zhiwen YU(), Huang XU, Bin GUO
School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
 Download: PDF(11510 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Anomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to dynamically adjust anomaly score calculation. In real scenarios, anomaly detection often needs to be regulated by human feedback, which benefits adjusting anomaly detectors. In this paper, we propose a human-machine interactive streaming anomaly detection method, named ISPForest, which can be adaptively updated online under the guidance of human feedback. In particular, the feedback will be used to adjust the anomaly score calculation and structure of the detector, ideally attaining more accurate anomaly scores in the future. Our main contribution is to improve the tree-based streaming anomaly detection model that can be updated online from perspectives of anomaly score calculation and model structure. Our approach is instantiated for the powerful class of tree-based streaming anomaly detectors, and we conduct experiments on a range of benchmark datasets. The results demonstrate that the utility of incorporating feedback can improve the performance of anomaly detectors with a few human efforts.

Keywords anomaly detection      human-machine interaction      human feedback      random space tree      ensemble method     
Corresponding Author(s): Zhiwen YU   
Just Accepted Date: 18 February 2022   Issue Date: 02 August 2022
 Cite this article:   
Qingyang LI,Zhiwen YU,Huang XU, et al. Human-machine interactive streaming anomaly detection by online self-adaptive forest[J]. Front. Comput. Sci., 2023, 17(2): 172317.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1270-y
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I2/172317
Methods/Properties iForest [5] RS-Forest [13] HS-Trees [12] LDOF [16] Loda [14] SA [24] AI2 [25] IF-AAD [26] ISPForest
Static
Streaming
Feedback
Subspaces
Emerging new classes
Model retraining
Data accumulation
Tab.1  Comparing ISPForest with state-of-the-art anomaly detection techniques in terms of various properties
Fig.1  The workflow of the anomaly detector. The detector reports an anomaly score y when an instance x is fed to the system. Afterward, the corresponding feedback t is given by a human expert, which is used to update the calculation of anomaly scores for the detector
  
  
Fig.2  The anomaly score determined by the ith SPT
  
Fig.3  An example of terminal collapsing and expansion on an SPT. Suppose the feature space is two-dimensional. ω denotes the selected feature for partitioning, and θ denotes the split point. Each terminal node in blue color corresponds to a subregion. The region is partitioned into six subregions, numbered from ① to ⑥. (a) The original SPT and corresponding region partitioning; (b) the pruning SPT and corresponding region partitioning
  
Dataset Total Dims Anomalies
Smtp 95,156 3 30 (0.03%)
Http 567,498 3 2,211 (0.39%)
Smtp+Http 662,654 3 2,241 (0.34%)
Shuttle 49,097 9 3,511 (7.15%)
Breastw 683 9 283 (35%)
Annthyroid 7,200 21 534 (7.42%)
Cardio 2,126 23 471 (22.15%)
Tab.2  A summary of datasets used in the experiments
Dataset iForest HS-trees RS-forest LDOF Loda SA MStream AI2 IF-AAD ISPForest
Smtp 0.8831 0.8751 0.8823 0.8769 0.9945 0.9974 0.9315 0.9962 0.9936 0.9983
Http 1.0000 0.9964 0.9991 0.9854 0.9982 0.9775 0.9832 0.9994 0.9982 0.9999
Smtp+Http 0.9972 0.9965 0.9982 0.9396 0.9927 0.9891 0.9623 0.9944 0.9943 0.9993
Shuttle 1.0000 0.9394 0.9980 0.9388 0.9925 0.9922 0.9868 0.9986 0.9995 0.9999
Breastw 0.9905 0.9924 0.9857 0.9983 0.9880 0.9914 0.9803 0.9988 0.9974 0.9992
Annthyroid 0.8233 0.7121 0.7327 0.7264 0.7592 0.8052 0.8197 0.8081 0.8224 0.8435
Cardio 0.8656 0.8417 0.7474 0.6983 0.7008 0.8112 0.8419 0.8331 0.8684 0.8982
Tab.3  Performance comparison of different methods on all benchmark datasets. AUC score is measured. Bold and underline results respectively indicate the best and second-best methods on each dataset
Fig.4  Runtime comparison on all benchmark datasets
Fig.5  Effects of varying depth limit and window size. (a) Effect of depth limit; (b) effect of window size
Fig.6  Effects of varying adaptive parameter on all benchmark datasets
Fig.7  Effects of feedback with increased batches on three datasets. (a) Number of feedback; (b) changes of AUC
  
  
  
  
1 D M Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
2 C C Aggarwal. Outlier analysis. In: Aggarwal C C, ed. Data Mining. Cham: Springer, 2015, 237– 263
3 U, Fiore Santis A, De F, Perla P, Zanetti F Palmieri . Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 2019, 479: 448– 455
4 V S, Tseng J C, Ying C W, Huang Y, Kao K T Chen. FrauDetector: a graph-mining-based framework for fraudulent phone call detection. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 2157– 2166
5 F T, Liu K M, Ting Z H Zhou. Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 413– 422
6 X, Yang L J, Latecki D Pokrajac. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of 2009 SIAM International Conference on Data Mining. 2009, 145– 154
7 B, Zong Q, Song M R, Min W, Cheng C, Lumezanu D K, Cho H F Chen. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 6th International Conference on Learning Representations. 2018
8 E, Manzoor S M, Milajerdi L Akoglu. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1035– 1044
9 H, Paulheim R Meusel . A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 2015, 100( 2): 509– 531
10 D, Overby J, Wall J Keyser. Interactive analysis of situational awareness metrics. In: Proceedings of SPIE 8294 Visualization and Data Analysis 2012. 2012, 829406
11 N, Cao C, Shi S, Lin J, Lu Y R, Lin C Y Lin . TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Transactions on Visualization and Computer Graphics, 2016, 22( 1): 280– 289
12 S C, Tan K M, Ting T F Liu. Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1511– 1516
13 K, Wu K, Zhang W, Fan A, Edwards P S Yu. RS-Forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 600– 609
14 T Pevný . Loda: lightweight on-line detector of anomalies. Machine Learning, 2016, 102( 2): 275– 304
15 S M, Erfani S, Rajasegarar S, Karunasekera C Leckie . High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 2016, 58: 121– 134
16 K, Zhang M, Hutter H Jin. A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 813– 822
17 S, Guha N, Mishra G, Roy O Schrijvers. Robust random cut forest based anomaly detection on streams. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2712– 2721
18 X, Mu K M, Ting Z H Zhou . Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 8): 1605– 1618
19 H M, Gomes A, Bifet J, Read J P, Barddal F, Enembreck B, Pfharinger G, Holmes T Abdessalem . Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106( 9−10): 1469– 1495
20 S, Ahmad A, Lavin S, Purdy Z Agha . Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 2017, 262: 134– 147
21 P, Malhotra L, Vig G, Shroff P Agarwal. Long short term memory networks for anomaly detection in time series. In: Proceedings of the 23rd European Symposium on Artificial Neural Networks. 2015, 89– 94
22 J, Qiu Q, Du C Qian . KPI-TSAD: a time-series anomaly detector for KPI monitoring in cloud applications. Symmetry, 2019, 11( 11): 1350
23 M, Munir S A, Siddiqui A, Dengel S Ahmed . DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access, 2018, 7: 1991– 2005
24 Y, Dong N Japkowicz . Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 2018, 34( 1): 261– 281
25 K, Veeramachaneni I, Arnaldo V, Korrapati C, Bassias K Li. AI2: training a big data machine to defend . In: Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). 2016, 49– 54
26 S, Das W K, Wong A, Fern T G, Dietterich M A Siddiqui. Incorporating feedback into tree-based anomaly detection. 2017, arXiv preprint arXiv: 1708.09441
27 S, Das W K, Wong T, Dietterich A, Fern A Emmott. Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 853– 858
28 K M, Ting G T, Zhou F T, Liu J S C Tan. Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 989– 998
29 B P Welford . Note on a method for calculating corrected sums of squares and products. Technometrics, 1962, 4( 3): 419– 420
30 S, Bhatia A, Jain P, Li R, Kumar B Hooi. MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the Web Conference 2021. 2021, 3371– 3382
31 D J, Hand R J Till . A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45( 2): 171– 186
32 B, Schölkopf R C, Williamson A J, Smola J, Shawe-Taylor J C Platt. Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 582– 588
33 M M, Breunig H P, Kriegel R T, Ng J Sander. LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 93– 104
[1] FCS-21270-OF-QL_suppl_1 Download
[1] Dongming HAN, Jiacheng PAN, Rusheng PAN, Dawei ZHOU, Nan CAO, Jingrui HE, Mingliang XU, Wei CHEN. iNet: visual analysis of irregular transition in multivariate dynamic networks[J]. Front. Comput. Sci., 2022, 16(2): 162701-.
[2] Yaopeng LIU, Hao PENG, Jianxin LI, Yangqiu SONG, Xiong LI. Event detection and evolution in multi-lingual social streams[J]. Front. Comput. Sci., 2020, 14(5): 145612-.
[3] Xudong ZHU, Zhijing LIU. Human behavior clustering for anomaly detection[J]. Front Comput Sci Chin, 2011, 5(3): 279-289.
[4] Xinguang TIAN, Xueqi CHENG, Miyi DUAN, Rui LIAO, Hong CHEN, Xiaojuan CHEN, . Network intrusion detection based on system calls and data mining[J]. Front. Comput. Sci., 2010, 4(4): 522-528.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed