Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (3) : 173322    https://doi.org/10.1007/s11704-022-1376-2
RESEARCH ARTICLE
Heterogeneous clustering via adversarial deep Bayesian generative model
Xulun YE(), Jieyu ZHAO
Institute of Computer Science and Technology, Ningbo University, Ningbo 315211, China
 Download: PDF(9685 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

This paper aims to study the deep clustering problem with heterogeneous features and unknown cluster number. To address this issue, a novel deep Bayesian clustering framework is proposed. In particular, a heterogeneous feature metric is first constructed to measure the similarity between different types of features. Then, a feature metric-restricted hierarchical sample generation process is established, in which sample with heterogeneous features is clustered by generating it from a similarity constraint hidden space. When estimating the model parameters and posterior probability, the corresponding variational inference algorithm is derived and implemented. To verify our model capability, we demonstrate our model on the synthetic dataset and show the superiority of the proposed method on some real datasets. Our source code is released on the website: Github.com/yexlwh/Heterogeneousclustering.

Keywords dirichlet process      heterogeneous clustering      generative adversarial network      laplacian approximation      variational inference     
Corresponding Author(s): Xulun YE   
Just Accepted Date: 27 April 2022   Issue Date: 07 November 2022
 Cite this article:   
Xulun YE,Jieyu ZHAO. Heterogeneous clustering via adversarial deep Bayesian generative model[J]. Front. Comput. Sci., 2023, 17(3): 173322.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1376-2
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I3/173322
Notations Descriptions
K^ Ground truth of the cluster number
Xs Samples in the s space
Xt Samples in the t space
X Samples which ignores the feature space
δis(),δit() Permutation function defined at xis and xit
Tst Metric measures feature space s and t
Tss,Ttt Metric of feature space s (t) and s (t)
T Heterogeneous metric
L Graph Laplacian of T
Seq() Structure order sequence
c0,u0,v0,B0 Hyper parameter for G0
Ω Hidden variables for the BHAC
α Hyper parameter for DP
Ψ() Digamma function
DP() Dirichlet process (DP)
GN() GAN generative network
DN() GAN discriminative network
K Maximum cluster number
zn Cluster indicator
γk,τk,?n Variational parameters
w~n Laplacian approximation parameters
Tab.1  The main notations and descriptions
Fig.1  Illustration of data structure with different features. The 1st row is the original images. The 2nd row shows the images with different light conditions. The 3rd row demonstrates the images with sift feature. Right figure illustrates the feature distribution in the 3D space. From the figure, we know that data with different feature share the same data structure
  
Fig.2  Overview of the optimization process of the proposed deep Bayesian generative network
  
Fig.3  Illustration of BHAC clustering result on the synthetic dataset. (a) and (b) demonstrate the original dataset without class labels; (c) and (d) are the BHAC results
Data COIL20 COIL100 BALLET USPS MNIST
Feature Original image+noise rotated image
D 100 100 130 70 30
K 70 130 60 60 50
Feature Original image+SIFT
D 90 90 60 90 40
K 70 130 110 70 40
Tab.2  Parameters used in the five real datasets
Method COIL20 COIL100 BALLET USPS MNIST COIL20 COIL100 BALLET USPS MNIST
Feature Original image+noise rotated Image Original image+SIFT+TNT
BHAC 0.51 0.54 0.45 0.49 0.47 0.61 0.55 0.23 0.41 0.41
DP-space 0.07 0.02 0.07 0.02 0.01 0.02 0.02 0.01 0.05 0.02
SCAMS 0.26 0.24 0.15 0.07 0.24 0.08 0.03 0.04 0.05 0.11
AutoSC-N 0.22 0.18 0.17 0.18 0.21 0.01 0.12 0.13 0.09 0.13
CFSFDP 0.17 0.21 0.29 0.21 0.34 0.36 0.17 0.19 0.21 0.21
DPM 0.05 0.12 0.03 0.03 0.26 0.06 0.04 0.03 0.07 0.11
GFMM 0.01 0.06 0.04 0.08 0.04 0.04 0.02 0.02 0.06 0.03
BLRASC 0.41 0.03 0.11 0.30 0.37 0.29 0.22 0.27 0.24 0.03
ACIDS 0.32 0.39 0.26 0.32 0.42 0.33 0.13 0.39 0.10 0.17
ClusterGAN 0.27 0.17 0.04 0.23 0.47 0.11 0.13 0.01 0.14 0.04
Spectral-Net 0.29 0.32 0.35 0.28 0.39 0.34 0.17 0.36 0.22 0.21
Feature Original image+noise rotated image+DAMA Original image+SIFT+CDLS
DP-space 0.04 0.12 0.02 0.07 0.03 0.06 0.01 0.02 0.02 0.04
SCAMS 0.26 0.25 0.14 0.21 0.27 0.11 0.02 0.02 0.01 0.02
AutoSC-N 0.31 0.37 0.11 0.23 0.31 0.31 0.07 0.06 0.14 0.12
CFSFDP 0.21 0.24 0.17 0.31 0.37 0.39 0.12 0.27 0.21 0.17
DPM 0.17 0.09 0.04 0.06 0.31 0.31 0.02 0.03 0.05 0.13
GFMM 0.12 0.16 0.02 0.06 0.02 0.03 0.01 0.01 0.01 0.04
BLRASC 0.39 0.42 0.27 0.34 0.44 0.37 0.21 0.11 0.17 0.12
ACIDS 0.41 0.44 0.27 0.27 0.51 0.42 0.11 0.36 0.13 0.14
ClusterGAN 0.36 0.32 0.05 0.38 0.58 0.07 0.04 0.07 0.08 0.02
Spectral-Net 0.31 0.40 0.22 0.35 0.54 0.34 0.09 0.29 0.12 0.17
Ground truth 20.0 100.0 44.0 10.0 10.0 20.0 100.0 44.0 10.0 10.0
Estimated number 31.2 127.6 60.4 20.2 21.2 25.1 125.2 7.1 30.1 17.1
Tab.3  Clustering accuracy (NMI) and the estimated cluster number on the real world datasets
Fig.4  Illustration of generated images with the COIL20 and BALLET dataset. (a) and (d) demonstrate the samples from original data. (b) and (e) show the samples with the noise and rotation. (c) and (f) are the images generated from our model
Data COIL20 COIL100 BALLET USPS MNIST COIL20 COIL100 BALLET USPS MNIST
Feature Original image+noise rotated image Original image+SIFT
Time 4813.5 22620.6 78160.3 67296.3 222426.7 3210.2 13108.1 34047.5 35915.9 110821.2
Tab.4  Running time on the five real datasets (seconds)
Fig.5  Illustration of the clustering result of the BHAC parameter effect on the COIL20, COIL100 and BALLET dataset. The 1st two figures (a) and (b) show the clustering accuracy (NMI), while the last two figures (c) and (d) demonstrates the estimated cluster number
Method COIL20 COIL100 BALLET
Same parameters 1.51 2.12 0.69
Hyper parameter α 1.67 2.02 0.68
Different stretagy 6.55 15.62 3.67
K 7.20 38.33 0.84
D 6.31 41.32 0.54
Tab.5  Variance of the estimated cluster number
  
  
1 Z, Jiang Y, Zheng H, Tan B, Tang H Zhou . Variational deep embedding: an unsupervised and generative approach to clustering. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1965–1972
2 P, Bhattacharjee P Mitra . A survey of density based clustering algorithms. Frontiers of Computer Science, 2021, 15( 1): 151308
3 H, Xue S, Li X, Chen Y Wang . A maximum margin clustering algorithm based on indefinite kernels. Frontiers of Computer Science, 2019, 13( 4): 813–827
4 K, Ghasedi X, Wang C, Deng H Huang . Balanced self-paced learning for generative adversarial clustering network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4386–4395
5 J, Wen Z, Zhang Y, Xu B, Zhang L, Fei G S Xie . CDIMC-net: cognitive deep incomplete multi-view clustering network. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 447
6 J, Xie R, Girshick A Farhadi . Unsupervised deep embedding for clustering analysis. In: Proceedings of the 3rd International Conference on Machine Learning. 2016, 478–487
7 P, Zhou Y, Hou J Feng . Deep adversarial subspace clustering. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1596–1604
8 X, Peng S, Xiao J, Feng W Y, Yau Z Yi . Deep subspace clustering with sparsity prior. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1925–1931
9 X, Guo L, Gao X, Liu J Yin . Improved deep embedded clustering with local structure preservation. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1753–1759
10 P, Ji T, Zhang H, Li M, Salzmann I Reid . Deep subspace clustering networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 23–32
11 Y, Yu W J Zhou . Mixture of GANs for clustering. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 3047–3053
12 X, Yang C, Deng F, Zheng J, Yan W Liu . Deep spectral clustering using dual autoencoder network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4061–4070
13 U, Shaham K P, Stanton H, Li R, Basri B, Nadler Y Kluger . SpectralNet: spectral clustering using deep neural networks. In: Proceedings of the 6th International Conference on Learning Representation. 2018
14 J, Cheng Q, Wang Z, Tao D, Xie Q Gao . Multi-view attribute graph convolution networks for clustering. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 411
15 W, Menapace S, Lathuilière E Ricci . Learning to cluster under domain shift. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 736–752
16 M, Tapaswi M T, Law S Fidler . Video face clustering with unknown number of clusters. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5026–5035
17 L, Yang X, Zhan D, Chen J, Yan C C, Boy D Lin . Learning to cluster faces on an affinity graph. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2293–2301
18 J, Li K, Lu Z, Huang L, Zhu H T Shen . Heterogeneous domain adaptation through progressive alignment. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 5): 1381–1391
19 S, Yang G, Song Y, Jin L Du . Domain adaptive classification on heterogeneous information networks. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 196
20 C, Wang S Mahadevan . Heterogeneous domain adaptation using manifold alignment. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1541–1546
21 Y H H, Tsai Y R, Yeh Y C F Wang . Learning cross-domain landmarks for heterogeneous domain adaptation. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5081–5090
22 Y R, Yeh C H, Huang Y C F Wang . Heterogeneous domain adaptation and classification by exploiting the correlation subspace. IEEE Transactions on Image Processing, 2014, 23( 5): 2009–2018
23 M, Wang W Deng . Deep visual domain adaptation: a survey. Neurocomputing, 2018, 312: 135–153
24 O, Day T M Khoshgoftaar . A survey on heterogeneous transfer learning. Journal of Big Data, 2017, 4( 1): 29
25 H, Wang Y, Yang B Liu . GMC: graph-based multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 2020, 32( 6): 1116–1129
26 S, Shi F, Nie R, Wang X Li . Fast multi-view clustering via prototype graph. IEEE Transactions on Knowledge and Data Engineering, 2021, doi:
https://doi.org/10.1109/TKDE.2021.3078728
27 Z, Li F, Nie X, Chang Y, Yang C, Zhang N Sebe . Dynamic affinity graph construction for spectral clustering using multiple features. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29( 12): 6323–6332
28 J, Yin S Sun . Incomplete multi-view clustering with reconstructed views. IEEE Transactions on Knowledge and Data Engineering, 2021, doi:
https://doi.org/10.1109/TKDE.2021.3112114
29 L, Li Z, Wan H He . Incomplete multi-view clustering with joint partition and graph learning. IEEE Transactions on Knowledge and Data Engineering, 2021, doi:
https://doi.org/10.1109/TKDE.2021.3082470
30 Y, Wang J Zhu . DP-space: Bayesian nonparametric subspace clustering with small-variance asymptotics. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 862–870
31 B, Gholami V Pavlovic . Probabilistic temporal subspace clustering. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 4313–4322
32 E, Simo-Serra C, Torras F Moreno-Noguer . 3D human pose tracking priors using geodesic mixture models. International Journal of Computer Vision, 2017, 122( 2): 388–408
33 J, Straub O, Freifeld G, Rosman J J, Leonard J W Fisher . The Manhattan frame model—Manhattan world inference in the space of surface normals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40( 1): 235–249
34 X, Ye J Zhao . Multi-manifold clustering: a graph-constrained deep nonparametric method. Pattern Recognition, 2019, 93: 215–227
35 X, Ye J, Zhao L, Zhang L Guo . A nonparametric deep generative model for multimanifold clustering. IEEE Transactions on Cybernetics, 2019, 49( 7): 2664–2677
36 L A, Hannah D M, Blei W B Powell . Dirichlet process mixtures of generalized linear models. The Journal of Machine Learning Research, 2011, 12: 1923–1953
37 Y, Wang J Zhu . Small-variance asymptotics for Dirichlet process mixtures of SVMs. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 2135–2141
38 D M, Blei M I Jordan . Variational inference for Dirichlet process mixtures. Bayesian Analysis, 2006, 1( 1): 121–143
39 Z, Li L F, Cheong S, Yang K C Toh . Simultaneous clustering and model selection: algorithm, theory and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40( 8): 1964–1978
40 J, Liang J, Yang M M, Cheng P L, Rosin L Wang . Simultaneous subspace clustering and cluster number estimating based on triplet relationship. IEEE Transactions on Image Processing, 2019, 28( 8): 3973–3985
41 A, Rodriguez A Laio . Clustering by fast search and find of density peaks. Science, 2014, 344( 6191): 1492–1496
42 X L, Ye J, Zhao Y, Chen L J Guo . Bayesian adversarial spectral clustering with unknown cluster number. IEEE Transactions on Image Processing, 2020, 29: 8506–8518
43 S, Mukherjee H, Asnani E, Lin S Kannan . ClusterGAN: latent space clustering in generative adversarial networks. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4610–4617
44 W Y, Chen T M H, Hsu Y H H, Tsai Y C F, Wang M S Chen . Transfer neural trees for heterogeneous domain adaptation. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 399–414
[1] FCS-21376-OF-XY_suppl_1 Download
[1] Song SUN, Bo ZHAO, Muhammad MATEEN, Xin CHEN, Junhao WEN. Mask guided diverse face image synthesis[J]. Front. Comput. Sci., 2022, 16(3): 163311-.
[2] Kaimin WEI, Tianqi LI, Feiran HUANG, Jinpeng CHEN, Zefan HE. Cancer classification with data augmentation based on generative adversarial networks[J]. Front. Comput. Sci., 2022, 16(2): 162601-.
[3] Hongwei GE, Yuxuan HAN, Wenjing KANG, Liang SUN. Unpaired image to image transformation via informative coupled generative adversarial networks[J]. Front. Comput. Sci., 2021, 15(4): 154326-.
[4] Zihan ZHOU, Yu GU, Ge YU. Adversarial network embedding using structural similarity[J]. Front. Comput. Sci., 2021, 15(1): 151603-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed