|
|
Meta-path-based outlier detection in heterogeneous information network |
Lu LIU1,2,3,4( ), Shang WANG3,5 |
1. Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun 130012, China 2. College of Software, Jilin University, Changchun 130012, China 3. College of Computer Science and Technology, Jilin University, Changchun 130012, China 4. College of Communication Engineering, Jilin University, Changchun 130012, China 5. Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark NJ 07102, USA |
|
|
Abstract Mining outliers in heterogeneous networks is crucial to many applications, but challenges abound. In this paper, we focus on identifying meta-path-based outliers in heterogeneous information network (HIN), and calculate the similarity between different types of objects. We propose a meta-path-based outlier detection method (MPOutliers) in heterogeneous information network to deal with problems in one go under a unified framework. MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships. It discovers the semantic information among nodes in heterogeneous networks, instead of only considering the network structure. It also computes the closeness degree between nodes with the same type, which extends the whole heterogeneous network. Moreover, each node is assigned with a reliable weighting to measure its authority degree. Substantial experiments on two real datasets (AMiner and Movies dataset) show that our proposed method is very effective and efficient for outlier detection.
|
Keywords
data mining
heterogeneous information network
outlier detection
short text similarity
|
Corresponding Author(s):
Lu LIU
|
Just Accepted Date: 05 May 2019
Online First Date: 17 September 2019
Issue Date: 16 October 2019
|
|
1 |
D M Hawkins. Identification of Outliers. 1st ed. Berlin: Springer, 1980
https://doi.org/10.1007/978-94-015-3994-4_1
|
2 |
A Dalmia, M Gupta, V Varma. Query-based evolutionary graph cuboid outlier detection. In: Proceedings of the 16th International Conference on Data Mining Workshops. 2016, 85–92
https://doi.org/10.1109/ICDMW.2016.0020
|
3 |
R Kaur, S Singh. A survey of data mining and social network analysis based anomaly detection techniques. Egyptian Informatics Journal, 2016, 17(2): 199–216
https://doi.org/10.1016/j.eij.2015.11.004
|
4 |
C Shi, Y Li, J Zhang, Y Sun, P S Yu. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17–37
https://doi.org/10.1109/TKDE.2016.2598561
|
5 |
G Pio, F Serafino, D Malerba, M Ceci. Multi-type clustering and classification from heterogeneous networks. Information Sciences, 2018, 425: 107–126
https://doi.org/10.1016/j.ins.2017.10.021
|
6 |
S Wu, S Wang. Information-theoretic outlier detection for large-scale categorical data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 589–602
https://doi.org/10.1109/TKDE.2011.261
|
7 |
S Vijayarani, P Jothi. A hybrid clustering algorithm for outlier detection in data streams. International Journal of Grid and Distributed Computing, 2016, 9(11): 285–295
https://doi.org/10.14257/ijgdc.2016.9.11.24
|
8 |
H Dai, F Zhu, E P Lim, H Pang. Detecting anomaly collections using extreme feature ranks. Data Mining and Knowledge Discovery, 2015, 29(3): 689–731
https://doi.org/10.1007/s10618-014-0360-3
|
9 |
F Rasheed, R Alhajj. A framework for periodic outlier pattern detection in time-series sequences. IEEE Transactions on Cybernetics, 2014, 44(5): 569–582
https://doi.org/10.1109/TSMCC.2013.2261984
|
10 |
M Gupta, J Gao, C Aggarwal, J Han. Community distribution outlier detection in heterogeneous information networks. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. 2013, 557–573
|
11 |
M Gupta, J Gao, X Yan, H Cam, J Han. On detecting association-based clique outliers in heterogeneous information networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 108–115
https://doi.org/10.1145/2492517.2492526
|
12 |
M Gupta, A Mallya, S Roy, J H D Cho, J Han. Local learning for mining outlier subgraphs from network datasets. In: Proceedings of the 2014 SIAM International Conference on Data Mining. 2014, 73–81
https://doi.org/10.1137/1.9781611973440.9
|
13 |
J Gao, F Liang, W Fan, C Wang, Y Sun, J Han. On community outliers and their efficient detection in information networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 813–822
https://doi.org/10.1145/1835804.1835907
|
14 |
Z Yao, P Mark, M Rabbat. Anomaly detection using proximity graph and PageRank algorithm. IEEE Transactions on Information Forensics and Security, 2012, 7(4): 1288–1300
https://doi.org/10.1109/TIFS.2012.2191963
|
15 |
M Radovanovic, A Nanopoulos, M Ivanovic. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1369–1382
https://doi.org/10.1109/TKDE.2014.2365790
|
16 |
M Gupta, J Gao, C C Aggarwal, J Han. Outlier Detection for Temporal Data. San Rafael, California: Morgan & Claypool Publishers, 2014
https://doi.org/10.2200/S00573ED1V01Y201403DMK008
|
17 |
M Gupta, J Gao, C C Aggarwal, J Han. Outlier detection for temporal data: a survey. IEEE Transactions on Data and Engineering, 2014, 26(9): 2250–2267
https://doi.org/10.1109/TKDE.2013.184
|
18 |
J Zhang, H Li, Q Gao, H Wang, Y Luo. Detecting anomalies from big network traffic data using an adaptive detection approach. Information Sciences, 2015, 318: 91–110
https://doi.org/10.1016/j.ins.2014.07.044
|
19 |
C C Aggarwal, Y Zhao, P S Yu. Outlier detection in graph streams. In: Proceedings of International Conference on Data Engineering. 2011, 399–409
https://doi.org/10.1109/ICDE.2011.5767885
|
20 |
L Akoglu, H Tong, D Koutra. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 2015, 29(3): 626–688
https://doi.org/10.1007/s10618-014-0365-y
|
21 |
S N Yin, H S Kang, S R Kim. Clustering algorithm based on outlier detection for anomaly intrusion detection. Journal of Internet Technology, 2016, 17(2): 291–299
|
22 |
M Gupta, J Gao, Y Sun, J Han. Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 859–867
https://doi.org/10.1145/2339530.2339667
|
23 |
H Zhuang, J Zhang, G Brova, J Tang, H Cam, X Yan, J Han. Mining query-based subnetwork outliers in heterogeneous information networks. In: Proceedings of IEEE International Conference on DataMining. 2014, 1127–1132
https://doi.org/10.1109/ICDM.2014.85
|
24 |
J Kuck, H Zhuang, X Yan, H Cam, J Han. Query-based outlier detection in heterogeneous information networks. In: Proceedings of the 18th International Conference on Extending Database Technology. 2015, 325–336
|
25 |
S Kim, N W Cho, Y J Lee, S H Kang, T Kim. Application of densitybased outlier detection to database activity monitoring. Information Systems Frontiers, 2013, 15(1): 55–65
https://doi.org/10.1007/s10796-010-9266-9
|
26 |
S Liu, L Chen, L M Ni. Anomaly detection from incomplete data. ACM Transactions on Knowledge Discovery from Data, 2014, 9(2): 11
https://doi.org/10.1145/2629668
|
27 |
A Rahmani, S Afra, O Zarour. Graph-based approach for outlier detection in sequential data and its application on stock market and weather data. Knowledge-based Systems, 2014, 61: 89–97
https://doi.org/10.1016/j.knosys.2014.02.008
|
28 |
X Cao, Y Zheng, C Shi, J Li, B Wu. Link prediction in schema-rich heterogeneous information network. In: Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2016, 449–460
https://doi.org/10.1007/978-3-319-31753-3_36
|
29 |
C Shi, X Kong, Y Huang, P S Yu. HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479–2492
https://doi.org/10.1109/TKDE.2013.2297920
|
30 |
C Shi, J Liu, F Zhuang, P S Yu, B Wu. Integrating heterogeneous information via flexible regularization framework for recommendation. Knowledge and Information Systems, 2016, 49(3): 835–859
https://doi.org/10.1007/s10115-016-0925-0
|
31 |
I Gunes, S Gunduz-Oguducu, Z Cataltepe. Link prediction using time series of neighborhood-based node similarity scores. Data Mining and Knowledge Discovery, 2016, 30(1): 147–180
https://doi.org/10.1007/s10618-015-0407-0
|
32 |
Y Sun, J Han, X Yan, P S Yu, T Wu. PathSim: meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of International Conference on Very Large Databases. 2011, 992–1003
|
33 |
J Tang, J Zhang, L Yao, J Li, L Zhang, Z Su. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 990–998
https://doi.org/10.1145/1401890.1402008
|
34 |
T Peng, L Liu. Focused crawling enhanced by CBP-SLC. Knowledgebased Systems, 2013, 51: 15–26
https://doi.org/10.1016/j.knosys.2013.06.008
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|