Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (4) : 144307    https://doi.org/10.1007/s11704-019-8263-5
RESEARCH ARTICLE
A novel classifier for multivariate instance using graph class signatures
Parnika PARANJAPE(), Meera DHABU, Parag DESHPANDE
Department of Computer Science, Visvesvaraya National Institute of Technology, Nagpur-440010, Maharashtra, India
 Download: PDF(787 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Applications like identifying different customers from their unique buying behaviours, determining ratingsof a product given by users based on different sets of features, etc. require classification using class-specific subsets of features. Most of the existing state-of-the-art classifiers for multivariate data use complete feature set for classification regardless of the different class labels. Decision tree classifier can produce class-wise subsets of features. However, none of these classifiers model the relationship between features which may enhance classification accuracy. We call the class-specific subsets of features and the features’ interrelationships as class signatures. In this work, we propose to map the original input space of multivariate data to the feature space characterized by connected graphs as graphs can easily model entities, their attributes, and relationships among attributes. Mostly, entities are modeled using graphs, where graphs occur naturally, for example, chemical compounds. However, graphs do not occur naturally in multivariate data. Thus, extracting class signatures from multivariate data is a challenging task. We propose some feature selection heuristics to obtain class-specific prominent subgraph signatures. We also propose two variants of class signatures based classifier namely: 1) maximum matching signature (gMM), and 2) score and size of matched signatures (gSM). The effectiveness of the proposed approach on real-world and synthetic datasets has been studied and compared with other established classifiers. Experimental results confirm the ascendancy of the proposed class signatures based classifier on most of the datasets.

Keywords multivariate instance      data transformation      subgraph feature selection      class signatures      classification     
Corresponding Author(s): Parnika PARANJAPE   
Just Accepted Date: 18 March 2019   Issue Date: 11 March 2020
 Cite this article:   
Parnika PARANJAPE,Meera DHABU,Parag DESHPANDE. A novel classifier for multivariate instance using graph class signatures[J]. Front. Comput. Sci., 2020, 14(4): 144307.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8263-5
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I4/144307
1 S Theodoridis, K Koutroumbas. Pattern Recognition. 4th ed. USA: Academic Press, Inc., 2008
2 S Esmeir, S Markovitch. Lookahead-based algorithms for anytime induction of decision trees. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 33
https://doi.org/10.1145/1015330.1015373
3 P J Tan, D L Dowe. MML inference of decision graphs with multiway joins and dynamic attributes. In: Proceedings of Australasian Joint Conference on Artificial Intelligence. 2003, 269–281
https://doi.org/10.1007/978-3-540-24581-0_23
4 Y Zhu, J X Yu, H Cheng, L Qin. Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 205–214
https://doi.org/10.1145/2396761.2396791
5 S Ranu, M Hoang, A K Singh. Mining discriminative subgraphs from global-state networks. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and DataMining. 2013, 509–517
https://doi.org/10.1145/2487575.2487692
6 X H Dang, A K, Singh P Bogdanov, H You, B Hsu. Discriminative subnetworks with regularized spectral learning for global-state network data. In: Proceedings of Joint European Conference onMachine Learning and Knowledge Discovery in Databases. 2014, 290–306
https://doi.org/10.1007/978-3-662-44848-9_19
7 X H Dang, H You, P Bogdanov, A K Singh. Learning predictive substructures with regularization for network data. In: Proceedings of IEEE International Conference on Data Mining. 2015, 81–90
https://doi.org/10.1109/ICDM.2015.56
8 J P Vert. Kernel methods in computational biology. Bussei Kenkyu, 2004, 81(97): 35–90
9 P Mahhé, N Ueda, T Akutsu, J L Perret. Extensions of marginalized graph kernels. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 70
https://doi.org/10.1145/1015330.1015446
10 M Seeland, A Karwath, S Kramer. A structural cluster kernel for learning on graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2012, 516–524
https://doi.org/10.1145/2339530.2339614
11 K Riesen, H Bunke. Graph classification by means of Lipschitz embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(6): 1472–1483
https://doi.org/10.1109/TSMCB.2009.2019264
12 D Bandyopadhyay, J Huan, J Liu, J Prins, J Snoeyink, W Wang, A Tropsha. Structure-based function inference using protein family-specific fingerprints. Protein Science, 2006, 15(6): 1537–1543
https://doi.org/10.1110/ps.062189906
13 H Cheng, X Yan, J Han, C W Hsu. Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 716–725
https://doi.org/10.1109/ICDE.2007.367917
14 M Deshpande, M Kuramochi, N Wale, G. KarypisFrequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1036–1050
https://doi.org/10.1109/TKDE.2005.127
15 X Yan, HanJ. gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining. 2002, 721–724
16 A Inokuchi, T Washio, H Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. 2000, 13–23
https://doi.org/10.1007/3-540-45372-5_2
17 M Kuramochi, G Karypis. Frequent subgraph discovery. In: Proceedings of the IEEE International Conference on Data Mining. 2001, 313–320
18 J Huan, W Wang, J Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 549–552
19 S Nijssen, J N Kok. A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2004, 647–652
https://doi.org/10.1145/1014052.1014134
20 K Zhou. gBolt: very fast implementation for gSpan algorithm in data mining. GitHub, 2017
21 X Yan, H Cheng, J Han, P S Yu. Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 433–444
https://doi.org/10.1145/1376616.1376662
22 H Saigo, N Krämer, K Tsuda. Partial least squares regression for graph mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 578–586
https://doi.org/10.1145/1401890.1401961
23 M Thoma, H Cheng, A Gretton, J Han, H P Kriegel, A Smola, L Song, P S Yu, X Yan, K Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 1076–1087
https://doi.org/10.1137/1.9781611972795.92
24 S Ranu, A K Singh. Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 844–855
https://doi.org/10.1109/ICDE.2009.133
25 N Jin, C Young, W Wang. Graph classification based on pattern cooccurrence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 573–582
https://doi.org/10.1145/1645953.1646027
26 N Jin, C Young, W Wang. GAIA: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 879–890
https://doi.org/10.1145/1807167.1807262
27 X Kong, P S Yu. Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 793–802
https://doi.org/10.1145/1835804.1835905
28 X Kong, S Y. PhilipgMLC: a multi-label feature selection framework for graph classification. Knowledge and Information Systems, 2012, 31(2): 281–305
https://doi.org/10.1007/s10115-011-0407-3
29 S Pan, X Zhu, C Zhang, S Y Philip. Graph stream classification using labeled and unlabeled graphs. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 398–409
30 X Kong, P S Yu, X Wang, A B Ragin. Discriminative feature selection for uncertain graph classification. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 82–93
https://doi.org/10.1137/1.9781611972832.10
31 J Wu, X Zhu, C Zhang, S Y Philip. Bag constrained structure pattern mining for multi-graph classification. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2382–2396
https://doi.org/10.1109/TKDE.2013.2297923
32 S Pan, J Wu, X Zhu, C Zhang. Joint structure feature exploration and regularization for multi-task graph classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 715–728
https://doi.org/10.1109/TKDE.2015.2492567
33 S Pan, J Wu, X Zhu, G Long, C Zhang. Task sensitive feature exploration and learning for multitask graph classification. IEEE Transactions on Cybernetics, 2017, 47(3): 744–758
https://doi.org/10.1109/TCYB.2016.2526058
34 S Pan, J Wu, X Zhu, G Long, C Zhang. Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognition, 2015, 48(11): 3783–3796
https://doi.org/10.1016/j.patcog.2015.05.019
35 H Saigo, S Nowozin, T Kadowaki, T Kudo, K Tsuda. gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 2009, 75(1): 69–89
https://doi.org/10.1007/s10994-008-5089-z
36 S Pan, X Zhu. Graph classification with imbalanced class distributions and noise. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1586–1592
37 T Kudo, E Maeda, Y Matsumoto. An application of boosting to graph classification. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 729–736
38 S Pan, J Wu, X Zhu, C Zhang. Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Transactions on Cybernetics, 2015, 45(5): 954–968
https://doi.org/10.1109/TCYB.2014.2341031
39 S Pan, J Wu, X Zhu. CogBoost: boosting for fast cost-sensitive graph classification. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11): 2933–2946
https://doi.org/10.1109/TKDE.2015.2391115
40 D J Cook, L B Holder. Mining Graph Data. John Wiley & Sons, 2006
https://doi.org/10.1002/0470073047
41 I H Witten , E Frank, M A Hall, C J Pal. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016
42 S B Thrun, J W Bala, E Bloedorn, I Bratko, B Cestnik, J Cheng, K A De Jong, S Dzeroski, D H Fisher, S E Fahlman, R Hamann. The monk’s problems: a performance comparison of different learning algorithms. Technical Report of Carnegie Mellon University, 1991
[1] FCS-0007-18263-PP_suppl_1 Download
[1] Yunyun WANG, Jiao HAN, Yating SHEN, Hui XUE. Pointwise manifold regularization for semi-supervised learning[J]. Front. Comput. Sci., 2021, 15(1): 151303-.
[2] Panthadeep BHATTACHARJEE, Pinaki MITRA. A survey of density based clustering algorithms[J]. Front. Comput. Sci., 2021, 15(1): 151308-.
[3] Qianchen YU, Zhiwen YU, Zhu WANG, Xiaofeng WANG, Yongzhi WANG. Estimating posterior inference quality of the relational infinite latent feature model for overlapping community detection[J]. Front. Comput. Sci., 2020, 14(6): 146323-.
[4] Muhammad Aminur RAHAMAN, Mahmood JASIM, Md. Haider ALI, Md. HASANUZZAMAN. Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language[J]. Front. Comput. Sci., 2020, 14(3): 143302-.
[5] Xibin DONG, Zhiwen YU, Wenming CAO, Yifan SHI, Qianli MA. A survey on ensemble learning[J]. Front. Comput. Sci., 2020, 14(2): 241-258.
[6] Hui XUE, Haiming XU, Xiaohong CHEN, Yunyun WANG. A primal perspective for indefinite kernel SVM problem[J]. Front. Comput. Sci., 2020, 14(2): 349-363.
[7] Rizwan Ahmed KHAN, Alexandre MEYER, Hubert KONIK, Saida BOUAKAZ. Saliency-based framework for facial expression recognition[J]. Front. Comput. Sci., 2019, 13(1): 183-198.
[8] Changlong WANG, Zhiyong FENG, Xiaowang ZHANG, Xin WANG, Guozheng RAO, Daoxun FU. ComR: a combined OWL reasoner for ontology classification[J]. Front. Comput. Sci., 2019, 13(1): 139-156.
[9] Munish SAINI, Kuljit Kaur CHAHAL. Change profile analysis of open-source software systems to understand their evolutionary behavior[J]. Front. Comput. Sci., 2018, 12(6): 1105-1124.
[10] Shuaiqiang WANG, Yilong YIN. Polygene-based evolutionary algorithms with frequent pattern mining[J]. Front. Comput. Sci., 2018, 12(5): 950-965.
[11] Wei SHAO,Yi DING,Hong-Bin SHEN,Daoqiang ZHANG. Deep model-based feature extraction for predicting protein subcellular localizations from bio-images[J]. Front. Comput. Sci., 2017, 11(2): 243-252.
[12] Jian-Hao LUO,Wang ZHOU,Jianxin WU. Image categorization with resource constraints: introduction, challenges and advances[J]. Front. Comput. Sci., 2017, 11(1): 13-26.
[13] Sarab ALMUHAIDEB,Mohamed El Bachir MENAI. Impact of preprocessing on medical data classification[J]. Front. Comput. Sci., 2016, 10(6): 1082-1102.
[14] Xin XU,Wei WANG,Jianhong WANG. A three-way incremental-learning algorithm for radar emitter identification[J]. Front. Comput. Sci., 2016, 10(4): 673-688.
[15] Lu YU,Jun XIE,Songcan CHEN,Lei ZHU. Generating labeled samples for hyperspectral image classification using correlation of spectral bands[J]. Front. Comput. Sci., 2016, 10(2): 292-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed