|
|
A novel classifier for multivariate instance using graph class signatures |
Parnika PARANJAPE( ), Meera DHABU, Parag DESHPANDE |
Department of Computer Science, Visvesvaraya National Institute of Technology, Nagpur-440010, Maharashtra, India |
|
|
Abstract Applications like identifying different customers from their unique buying behaviours, determining ratingsof a product given by users based on different sets of features, etc. require classification using class-specific subsets of features. Most of the existing state-of-the-art classifiers for multivariate data use complete feature set for classification regardless of the different class labels. Decision tree classifier can produce class-wise subsets of features. However, none of these classifiers model the relationship between features which may enhance classification accuracy. We call the class-specific subsets of features and the features’ interrelationships as class signatures. In this work, we propose to map the original input space of multivariate data to the feature space characterized by connected graphs as graphs can easily model entities, their attributes, and relationships among attributes. Mostly, entities are modeled using graphs, where graphs occur naturally, for example, chemical compounds. However, graphs do not occur naturally in multivariate data. Thus, extracting class signatures from multivariate data is a challenging task. We propose some feature selection heuristics to obtain class-specific prominent subgraph signatures. We also propose two variants of class signatures based classifier namely: 1) maximum matching signature (gMM), and 2) score and size of matched signatures (gSM). The effectiveness of the proposed approach on real-world and synthetic datasets has been studied and compared with other established classifiers. Experimental results confirm the ascendancy of the proposed class signatures based classifier on most of the datasets.
|
Keywords
multivariate instance
data transformation
subgraph feature selection
class signatures
classification
|
Corresponding Author(s):
Parnika PARANJAPE
|
Just Accepted Date: 18 March 2019
Issue Date: 11 March 2020
|
|
1 |
S Theodoridis, K Koutroumbas. Pattern Recognition. 4th ed. USA: Academic Press, Inc., 2008
|
2 |
S Esmeir, S Markovitch. Lookahead-based algorithms for anytime induction of decision trees. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 33
https://doi.org/10.1145/1015330.1015373
|
3 |
P J Tan, D L Dowe. MML inference of decision graphs with multiway joins and dynamic attributes. In: Proceedings of Australasian Joint Conference on Artificial Intelligence. 2003, 269–281
https://doi.org/10.1007/978-3-540-24581-0_23
|
4 |
Y Zhu, J X Yu, H Cheng, L Qin. Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 205–214
https://doi.org/10.1145/2396761.2396791
|
5 |
S Ranu, M Hoang, A K Singh. Mining discriminative subgraphs from global-state networks. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and DataMining. 2013, 509–517
https://doi.org/10.1145/2487575.2487692
|
6 |
X H Dang, A K, Singh P Bogdanov, H You, B Hsu. Discriminative subnetworks with regularized spectral learning for global-state network data. In: Proceedings of Joint European Conference onMachine Learning and Knowledge Discovery in Databases. 2014, 290–306
https://doi.org/10.1007/978-3-662-44848-9_19
|
7 |
X H Dang, H You, P Bogdanov, A K Singh. Learning predictive substructures with regularization for network data. In: Proceedings of IEEE International Conference on Data Mining. 2015, 81–90
https://doi.org/10.1109/ICDM.2015.56
|
8 |
J P Vert. Kernel methods in computational biology. Bussei Kenkyu, 2004, 81(97): 35–90
|
9 |
P Mahhé, N Ueda, T Akutsu, J L Perret. Extensions of marginalized graph kernels. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 70
https://doi.org/10.1145/1015330.1015446
|
10 |
M Seeland, A Karwath, S Kramer. A structural cluster kernel for learning on graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2012, 516–524
https://doi.org/10.1145/2339530.2339614
|
11 |
K Riesen, H Bunke. Graph classification by means of Lipschitz embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(6): 1472–1483
https://doi.org/10.1109/TSMCB.2009.2019264
|
12 |
D Bandyopadhyay, J Huan, J Liu, J Prins, J Snoeyink, W Wang, A Tropsha. Structure-based function inference using protein family-specific fingerprints. Protein Science, 2006, 15(6): 1537–1543
https://doi.org/10.1110/ps.062189906
|
13 |
H Cheng, X Yan, J Han, C W Hsu. Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 716–725
https://doi.org/10.1109/ICDE.2007.367917
|
14 |
M Deshpande, M Kuramochi, N Wale, G. KarypisFrequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1036–1050
https://doi.org/10.1109/TKDE.2005.127
|
15 |
X Yan, HanJ. gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining. 2002, 721–724
|
16 |
A Inokuchi, T Washio, H Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. 2000, 13–23
https://doi.org/10.1007/3-540-45372-5_2
|
17 |
M Kuramochi, G Karypis. Frequent subgraph discovery. In: Proceedings of the IEEE International Conference on Data Mining. 2001, 313–320
|
18 |
J Huan, W Wang, J Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 549–552
|
19 |
S Nijssen, J N Kok. A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2004, 647–652
https://doi.org/10.1145/1014052.1014134
|
20 |
K Zhou. gBolt: very fast implementation for gSpan algorithm in data mining. GitHub, 2017
|
21 |
X Yan, H Cheng, J Han, P S Yu. Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 433–444
https://doi.org/10.1145/1376616.1376662
|
22 |
H Saigo, N Krämer, K Tsuda. Partial least squares regression for graph mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 578–586
https://doi.org/10.1145/1401890.1401961
|
23 |
M Thoma, H Cheng, A Gretton, J Han, H P Kriegel, A Smola, L Song, P S Yu, X Yan, K Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 1076–1087
https://doi.org/10.1137/1.9781611972795.92
|
24 |
S Ranu, A K Singh. Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 844–855
https://doi.org/10.1109/ICDE.2009.133
|
25 |
N Jin, C Young, W Wang. Graph classification based on pattern cooccurrence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 573–582
https://doi.org/10.1145/1645953.1646027
|
26 |
N Jin, C Young, W Wang. GAIA: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 879–890
https://doi.org/10.1145/1807167.1807262
|
27 |
X Kong, P S Yu. Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 793–802
https://doi.org/10.1145/1835804.1835905
|
28 |
X Kong, S Y. PhilipgMLC: a multi-label feature selection framework for graph classification. Knowledge and Information Systems, 2012, 31(2): 281–305
https://doi.org/10.1007/s10115-011-0407-3
|
29 |
S Pan, X Zhu, C Zhang, S Y Philip. Graph stream classification using labeled and unlabeled graphs. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 398–409
|
30 |
X Kong, P S Yu, X Wang, A B Ragin. Discriminative feature selection for uncertain graph classification. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 82–93
https://doi.org/10.1137/1.9781611972832.10
|
31 |
J Wu, X Zhu, C Zhang, S Y Philip. Bag constrained structure pattern mining for multi-graph classification. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2382–2396
https://doi.org/10.1109/TKDE.2013.2297923
|
32 |
S Pan, J Wu, X Zhu, C Zhang. Joint structure feature exploration and regularization for multi-task graph classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 715–728
https://doi.org/10.1109/TKDE.2015.2492567
|
33 |
S Pan, J Wu, X Zhu, G Long, C Zhang. Task sensitive feature exploration and learning for multitask graph classification. IEEE Transactions on Cybernetics, 2017, 47(3): 744–758
https://doi.org/10.1109/TCYB.2016.2526058
|
34 |
S Pan, J Wu, X Zhu, G Long, C Zhang. Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognition, 2015, 48(11): 3783–3796
https://doi.org/10.1016/j.patcog.2015.05.019
|
35 |
H Saigo, S Nowozin, T Kadowaki, T Kudo, K Tsuda. gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 2009, 75(1): 69–89
https://doi.org/10.1007/s10994-008-5089-z
|
36 |
S Pan, X Zhu. Graph classification with imbalanced class distributions and noise. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1586–1592
|
37 |
T Kudo, E Maeda, Y Matsumoto. An application of boosting to graph classification. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 729–736
|
38 |
S Pan, J Wu, X Zhu, C Zhang. Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Transactions on Cybernetics, 2015, 45(5): 954–968
https://doi.org/10.1109/TCYB.2014.2341031
|
39 |
S Pan, J Wu, X Zhu. CogBoost: boosting for fast cost-sensitive graph classification. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11): 2933–2946
https://doi.org/10.1109/TKDE.2015.2391115
|
40 |
D J Cook, L B Holder. Mining Graph Data. John Wiley & Sons, 2006
https://doi.org/10.1002/0470073047
|
41 |
I H Witten , E Frank, M A Hall, C J Pal. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016
|
42 |
S B Thrun, J W Bala, E Bloedorn, I Bratko, B Cestnik, J Cheng, K A De Jong, S Dzeroski, D H Fisher, S E Fahlman, R Hamann. The monk’s problems: a performance comparison of different learning algorithms. Technical Report of Carnegie Mellon University, 1991
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|