Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (2) : 388-403    https://doi.org/10.1007/s11704-018-7289-4
RESEARCH ARTICLE
Meta-path-based outlier detection in heterogeneous information network
Lu LIU1,2,3,4(), Shang WANG3,5
1. Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun 130012, China
2. College of Software, Jilin University, Changchun 130012, China
3. College of Computer Science and Technology, Jilin University, Changchun 130012, China
4. College of Communication Engineering, Jilin University, Changchun 130012, China
5. Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark NJ 07102, USA
 Download: PDF(748 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Mining outliers in heterogeneous networks is crucial to many applications, but challenges abound. In this paper, we focus on identifying meta-path-based outliers in heterogeneous information network (HIN), and calculate the similarity between different types of objects. We propose a meta-path-based outlier detection method (MPOutliers) in heterogeneous information network to deal with problems in one go under a unified framework. MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships. It discovers the semantic information among nodes in heterogeneous networks, instead of only considering the network structure. It also computes the closeness degree between nodes with the same type, which extends the whole heterogeneous network. Moreover, each node is assigned with a reliable weighting to measure its authority degree. Substantial experiments on two real datasets (AMiner and Movies dataset) show that our proposed method is very effective and efficient for outlier detection.

Keywords data mining      heterogeneous information network      outlier detection      short text similarity     
Corresponding Author(s): Lu LIU   
Just Accepted Date: 05 May 2019   Online First Date: 17 September 2019    Issue Date: 16 October 2019
 Cite this article:   
Lu LIU,Shang WANG. Meta-path-based outlier detection in heterogeneous information network[J]. Front. Comput. Sci., 2020, 14(2): 388-403.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7289-4
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I2/388
1 D M Hawkins. Identification of Outliers. 1st ed. Berlin: Springer, 1980
https://doi.org/10.1007/978-94-015-3994-4_1
2 A Dalmia, M Gupta, V Varma. Query-based evolutionary graph cuboid outlier detection. In: Proceedings of the 16th International Conference on Data Mining Workshops. 2016, 85–92
https://doi.org/10.1109/ICDMW.2016.0020
3 R Kaur, S Singh. A survey of data mining and social network analysis based anomaly detection techniques. Egyptian Informatics Journal, 2016, 17(2): 199–216
https://doi.org/10.1016/j.eij.2015.11.004
4 C Shi, Y Li, J Zhang, Y Sun, P S Yu. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17–37
https://doi.org/10.1109/TKDE.2016.2598561
5 G Pio, F Serafino, D Malerba, M Ceci. Multi-type clustering and classification from heterogeneous networks. Information Sciences, 2018, 425: 107–126
https://doi.org/10.1016/j.ins.2017.10.021
6 S Wu, S Wang. Information-theoretic outlier detection for large-scale categorical data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 589–602
https://doi.org/10.1109/TKDE.2011.261
7 S Vijayarani, P Jothi. A hybrid clustering algorithm for outlier detection in data streams. International Journal of Grid and Distributed Computing, 2016, 9(11): 285–295
https://doi.org/10.14257/ijgdc.2016.9.11.24
8 H Dai, F Zhu, E P Lim, H Pang. Detecting anomaly collections using extreme feature ranks. Data Mining and Knowledge Discovery, 2015, 29(3): 689–731
https://doi.org/10.1007/s10618-014-0360-3
9 F Rasheed, R Alhajj. A framework for periodic outlier pattern detection in time-series sequences. IEEE Transactions on Cybernetics, 2014, 44(5): 569–582
https://doi.org/10.1109/TSMCC.2013.2261984
10 M Gupta, J Gao, C Aggarwal, J Han. Community distribution outlier detection in heterogeneous information networks. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. 2013, 557–573
11 M Gupta, J Gao, X Yan, H Cam, J Han. On detecting association-based clique outliers in heterogeneous information networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 108–115
https://doi.org/10.1145/2492517.2492526
12 M Gupta, A Mallya, S Roy, J H D Cho, J Han. Local learning for mining outlier subgraphs from network datasets. In: Proceedings of the 2014 SIAM International Conference on Data Mining. 2014, 73–81
https://doi.org/10.1137/1.9781611973440.9
13 J Gao, F Liang, W Fan, C Wang, Y Sun, J Han. On community outliers and their efficient detection in information networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 813–822
https://doi.org/10.1145/1835804.1835907
14 Z Yao, P Mark, M Rabbat. Anomaly detection using proximity graph and PageRank algorithm. IEEE Transactions on Information Forensics and Security, 2012, 7(4): 1288–1300
https://doi.org/10.1109/TIFS.2012.2191963
15 M Radovanovic, A Nanopoulos, M Ivanovic. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1369–1382
https://doi.org/10.1109/TKDE.2014.2365790
16 M Gupta, J Gao, C C Aggarwal, J Han. Outlier Detection for Temporal Data. San Rafael, California: Morgan & Claypool Publishers, 2014
https://doi.org/10.2200/S00573ED1V01Y201403DMK008
17 M Gupta, J Gao, C C Aggarwal, J Han. Outlier detection for temporal data: a survey. IEEE Transactions on Data and Engineering, 2014, 26(9): 2250–2267
https://doi.org/10.1109/TKDE.2013.184
18 J Zhang, H Li, Q Gao, H Wang, Y Luo. Detecting anomalies from big network traffic data using an adaptive detection approach. Information Sciences, 2015, 318: 91–110
https://doi.org/10.1016/j.ins.2014.07.044
19 C C Aggarwal, Y Zhao, P S Yu. Outlier detection in graph streams. In: Proceedings of International Conference on Data Engineering. 2011, 399–409
https://doi.org/10.1109/ICDE.2011.5767885
20 L Akoglu, H Tong, D Koutra. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 2015, 29(3): 626–688
https://doi.org/10.1007/s10618-014-0365-y
21 S N Yin, H S Kang, S R Kim. Clustering algorithm based on outlier detection for anomaly intrusion detection. Journal of Internet Technology, 2016, 17(2): 291–299
22 M Gupta, J Gao, Y Sun, J Han. Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 859–867
https://doi.org/10.1145/2339530.2339667
23 H Zhuang, J Zhang, G Brova, J Tang, H Cam, X Yan, J Han. Mining query-based subnetwork outliers in heterogeneous information networks. In: Proceedings of IEEE International Conference on DataMining. 2014, 1127–1132
https://doi.org/10.1109/ICDM.2014.85
24 J Kuck, H Zhuang, X Yan, H Cam, J Han. Query-based outlier detection in heterogeneous information networks. In: Proceedings of the 18th International Conference on Extending Database Technology. 2015, 325–336
25 S Kim, N W Cho, Y J Lee, S H Kang, T Kim. Application of densitybased outlier detection to database activity monitoring. Information Systems Frontiers, 2013, 15(1): 55–65
https://doi.org/10.1007/s10796-010-9266-9
26 S Liu, L Chen, L M Ni. Anomaly detection from incomplete data. ACM Transactions on Knowledge Discovery from Data, 2014, 9(2): 11
https://doi.org/10.1145/2629668
27 A Rahmani, S Afra, O Zarour. Graph-based approach for outlier detection in sequential data and its application on stock market and weather data. Knowledge-based Systems, 2014, 61: 89–97
https://doi.org/10.1016/j.knosys.2014.02.008
28 X Cao, Y Zheng, C Shi, J Li, B Wu. Link prediction in schema-rich heterogeneous information network. In: Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2016, 449–460
https://doi.org/10.1007/978-3-319-31753-3_36
29 C Shi, X Kong, Y Huang, P S Yu. HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479–2492
https://doi.org/10.1109/TKDE.2013.2297920
30 C Shi, J Liu, F Zhuang, P S Yu, B Wu. Integrating heterogeneous information via flexible regularization framework for recommendation. Knowledge and Information Systems, 2016, 49(3): 835–859
https://doi.org/10.1007/s10115-016-0925-0
31 I Gunes, S Gunduz-Oguducu, Z Cataltepe. Link prediction using time series of neighborhood-based node similarity scores. Data Mining and Knowledge Discovery, 2016, 30(1): 147–180
https://doi.org/10.1007/s10618-015-0407-0
32 Y Sun, J Han, X Yan, P S Yu, T Wu. PathSim: meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of International Conference on Very Large Databases. 2011, 992–1003
33 J Tang, J Zhang, L Yao, J Li, L Zhang, Z Su. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 990–998
https://doi.org/10.1145/1401890.1402008
34 T Peng, L Liu. Focused crawling enhanced by CBP-SLC. Knowledgebased Systems, 2013, 51: 15–26
https://doi.org/10.1016/j.knosys.2013.06.008
[1] Article highlights Download
[1] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[2] Chuan SHI, Jiayu DING, Xiaohuan CAO, Linmei HU, Bin WU, Xiaoli LI. Entity set expansion in knowledge graph: a heterogeneous information network perspective[J]. Front. Comput. Sci., 2021, 15(1): 151307-.
[3] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[4] Guijuan ZHANG, Yang LIU, Xiaoning JIN. A survey of autoencoder-based recommender systems[J]. Front. Comput. Sci., 2020, 14(2): 430-450.
[5] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[6] Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[7] Shuaiqiang WANG, Yilong YIN. Polygene-based evolutionary algorithms with frequent pattern mining[J]. Front. Comput. Sci., 2018, 12(5): 950-965.
[8] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[9] Yuan LI, Yuhai ZHAO, Guoren WANG, Xiaofeng ZHU, Xiang ZHANG, Zhanghui WANG, Jun PANG. Finding susceptible and protective interaction patterns in large-scale genetic association study[J]. Front. Comput. Sci., 2017, 11(3): 541-554.
[10] Junhua LU,Wei CHEN,Yuxin MA,Junming KE,Zongzhuang LI,Fan ZHANG,Ross MACIEJEWSKI. Recent progress and trends in predictive visual analytics[J]. Front. Comput. Sci., 2017, 11(2): 192-207.
[11] Chengliang WANG,Yayun PENG,Debraj DE,Wen-Zhan SONG. DPHK: real-time distributed predicted data collecting based on activity pattern knowledge mined from trajectories in smart environments[J]. Front. Comput. Sci., 2016, 10(6): 1000-1011.
[12] Xin XU,Wei WANG,Jianhong WANG. A three-way incremental-learning algorithm for radar emitter identification[J]. Front. Comput. Sci., 2016, 10(4): 673-688.
[13] Wenmei LIU,Hui LIU. Major motivations for extract method refactorings: analysis based on interviews and change histories[J]. Front. Comput. Sci., 2016, 10(4): 644-656.
[14] Yaobin HE, Haoyu TAN, Wuman LUO, Shengzhong FENG, Jianping FAN. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data[J]. Front. Comput. Sci., 2014, 8(1): 83-99.
[15] Fabian GIESEKE, Gabriel MORUZ, Jan VAHRENHOLD. Resilient k-d trees: k-means in space revisited[J]. Front Comput Sci, 2012, 6(2): 166-178.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed