Constrained clustering with weak label prior

doi:10.1007/s11704-023-3355-7

Front. Comput. Sci.

2024, Vol. 18

Issue (3) : 183338 https://doi.org/10.1007/s11704-023-3355-7

RESEARCH ARTICLE

Constrained clustering with weak label prior

Jing ZHANG, Ruidong FAN, Hong TAO, Jiacheng JIANG, Chenping HOU(

)

College of Sciences, National University of Defense Technology, Changsha 410073, China

Download: PDF(14281 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.

Keywords clustering weak label prior cluster ratio pairwise constraints

Corresponding Author(s): Chenping HOU

Just Accepted Date: 06 September 2023 Issue Date: 11 December 2023

Cite this article:

Jing ZHANG,Ruidong FAN,Hong TAO, et al. Constrained clustering with weak label prior[J]. Front. Comput. Sci., 2024, 18(3): 183338.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3355-7
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I3/183338

Fig.1 (a) False color images of the hyperspectral image; (b) two must-link samples are linked with a solid red line, and the cannot-link samples are indicated with dotted blue lines

Fig.2 Compound weak label prior plays an essential role in increasing clustering confidence

Notations	Descriptions
$n$	The number of instances
$k$	The number of classes
$q$	The dimension of instance
$w$	The ratio of minority cluster
$x i ∈ R q$	The $i$ th instance
$g i ∈ {0, 1} k$	The cluster indicator of the $i$ -th instance
$v ~ ∈ R n × 1$	The cannot-link indicator vector between $x a$
	and $x b$ , where $v (a) = 1$ , $v (b) = ? 1$
$G ∈ R n × k$	The cluster indicator matrix
$V ∈ R n × p$	The cannot-link constrained matrix
$W ∈ R n × n$	The fixed input similarity matrix
$S ∈ R n × n$	The learned similarity matrix
$D ∈ R n × n$	The degree matrix
$L S ∈ R n × n$	The Laplacian matrix
$1 a ∈ R a$	The vector of all ones for thenumber $a$
$I b ∈ R b × b$	The identity matrix
$P G ∈ R n × k$	The scaled cluster indicator matrix

Tab.1 Notations

Tab.2 The average ACC (std) for all datasets. The best result on each dataset is highlighted in bold

Tab.3 The average F-score (std) for all datasets. The best result on each dataset is highlighted in bold

Tab.4 The average Recall (std) for all datasets. The best result on each dataset is highlighted in bold

Tab.5 The average Purity (std) for all datasets. The best result on each dataset is highlighted in bold

Fig.3 Convergence curve of the proposed CWLP on three datasets. (a) ecoli3; (b) vowel0; (c) dermatology6. These figures show the value of the objective function decreases rapidly and achieves convergence after a few iterations

Fig.4 The ACC (F-score) value as a function of

β

and

γ

for three datasets. (a) Glass; (b) glass6; (c) ecoil3

Tab.6 The average results (std) on two datasets are measured by ACC, F-score, and Recall. The best result on each dataset is highlighted in bold

1	A K, Jain M N, Murty P J Flynn . Data clustering: a review. ACM Computing Surveys, 1999, 31( 3): 264–323
2	Voss J, Belkin M, Rademacher L. The hidden convexity of spectral clustering. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2108−2114
3	E, Elhamifar R Vidal . Sparse subspace clustering: algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 11): 2765–2781
4	S, Kim C D, Yoo S, Nowozin P Kohli . Image segmentation using higher-order correlation clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36( 9): 1761–1774
5	H, Mittal A C, Pandey M, Saraswat S, Kumar R, Pal G Modwel . A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. Multimedia Tools and Applications, 2022, 81( 24): 35001–35026
6	J, Shi J Malik . Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22( 8): 888–905
7	J, Zhang K, Zhou Y, Luximon P, Li H Iftikhar . 3D-guided facial shape clustering and analysis. Multimedia Tools and Applications, 2022, 81( 6): 8785–8806
8	Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 715−724
9	J A, Hartigan M A Wong . A k-means clustering algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28( 1): 100–108
10	T, Kanungo D M, Mount N S, Netanyahu C D, Piatko R, Silverman A Wu . An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24( 7): 881–892
11	S C Johnson . Hierarchical clustering schemes. Psychometrika, 1967, 32( 3): 241–254
12	Levin M S. Towards hierarchical clustering. In: Proceedings of the 2nd international conference on Computer Science: theory and applications. 2007, 205−215
13	Luxburg U Von . A tutorial on spectral clustering. Statistics and Computing, 2007, 17( 4): 395–416
14	C, Lu S, Yan Z Lin . Convex sparse spectral clustering: single-view to multi-view. IEEE Transactions on Image Processing, 2016, 25( 6): 2833–2843
15	A A, Khan S K Mohanty . A fast spectral clustering technique using MST based proximity graph for diversified datasets. Information Sciences, 2022, 609: 1113–1131
16	C, Lu J, Feng Z, Lin T, Mei S Yan . Subspace clustering by block diagonal representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41( 2): 487–501
17	H, Gao F, Nie X, Li H Huang . Multi-view subspace clustering. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 4238−4246
18	W, Dong X J, Wu J Kittler . Subspace clustering via joint ℓ1,2 and ℓ2,1 norms. Information Sciences, 2022, 612: 675–686
19	M, Parmar D, Wang X, Zhang A H, Tan C, Miao J, Jiang Y Zhou . Redpc: a residual error-based density peak clustering algorithm. Neurocomputing, 2019, 348: 82–96
20	Parmar M D, Pang W, Hao D, Jiang J, Liupu W, Wang L, Zhou Y. FREDPC: a feasible residual error-based density peak clustering algorithm with the fragment merging strategy. IEEE Access, 2019, 7: 89789−89804
21	G, Liu Z, Lin S, Yan J, Sun Y, Yu Y Ma . Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 1): 171–184
22	Z, Tao H, Liu S, Li Z, Ding Y Fu . Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data, 2019, 13( 1): 1–25
23	Y, Zhu J T, Kwok Z H Zhou . Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 6): 1081–1094
24	S, Saha A K, Alok A Ekbal . Brain image segmentation using semi-supervised clustering. Expert Systems with Applications, 2016, 52: 50–63
25	M, Śmieja O, Myronov J Tabor . Semi-supervised discriminative clustering with graph regularization. Knowledge-Based Systems, 2018, 151: 24–36
26	Nie F, Zhang H, Wang R, Li X. Semi-supervised clustering via pairwise constrained optimal graph. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 437
27	Z, Zhu Q Gao . Semi-supervised clustering via cannot link relationship for multiview data. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32( 12): 8744–8755
28	X, Wang B, Qian I Davidson . On constrained spectral clustering and its applications. Data Mining and Knowledge Discovery, 2014, 28( 1): 1–30
29	L M, Chen B X, Xiu Z Y Ding . Multiple weak supervision for short text classification. Applied Intelligence, 2022, 52( 8): 9101–9116
30	R, Rasti M, Teshnehlab S L Phung . Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks. Pattern Recognition, 2017, 72: 381–390
31	K T, Lai F X, Yu M S, Chen S F Chang . Video event detection by inferring temporal instance labels. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2243−2250
32	Poyiadzi R, Santos-Rodríguez R, Twomey N. Label propagation for learning with label proportions. In: Proceedings of the 28th IEEE International Workshop on Machine Learning for Signal Processing. 2018, 1−6
33	N, Sun T, Luo W, Zhuge H, Tao C, Hou D Hu . Semi-supervised learning with label proportion. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 1): 877–890
34	Ng A, Jordan M, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2011
35	Zhang X, You Q. An improved spectral clustering algorithm based on random walk. Frontiers of Computer Science in China, 2011, 5(3): 268–278
36	Yu S X, Shi J. Multiclass spectral clustering. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 313−313
37	W, Liu J, He S Chang . Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. 2010, 679−686
38	Huang J, Nie F, Huang H. Spectral rotation versus k-means in spectral clustering. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 431−437
39	Nie F P, Wang X Q, Jordan M, Huang H. The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2016, 1969−1976
40	Y, Pang J, Xie F, Nie X Li . Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 2020, 50( 1): 247–258
41	F, Nie R, Zhang X Li . A generalized power iteration method for solving quadratic problem on the stiefel manifold. Science China Information Sciences, 2017, 60( 11): 112101
42	X, Zhu Z Ghahramani . Learning from labeled and unlabeled data with label propagation. Pittsburgh: Carnegie Mellon University, 2002
43	L, Hagen A B Kahng . New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074−1085
44	MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281−297
45	Craenendonck T, Van S, Dumančic H Blockeel . COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2018
46	F, Nie W, Zhu X Li . Unsupervised large graph embedding based on balanced and hierarchical k-means. IEEE Transactions on Knowledge and Data Engineering, 2020, 34( 4): 2008–2019
47	J, Zhu L, Tao H, Yang F Nie . Unsupervised optimized bipartite graph embedding. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 3): 3224–3238
48	J, Wang Z, Ma F, Nie X Li . Efficient discrete clustering with anchor graph. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1–9, doi:
49	D, Zhou O, Bousquet T N, Lal J, Weston B Schölkopf . Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321−328

[1]

FCS-23355-OF-JZ_suppl_1

Download

[1]	Zhe YUAN, Zhewei WEI, Fangrui LV, Ji-Rong WEN. Index-free triangle-based graph local clustering[J]. Front. Comput. Sci., 2024, 18(3): 183404-.
[2]	Bo YANG, Xiuyin MA, Chunhui WANG, Haoran GUO, Huai LIU, Zhi JIN. User story clustering in agile development: a framework and an empirical study[J]. Front. Comput. Sci., 2023, 17(6): 176213-.
[3]	Mingzhao WANG, Henry HAN, Zhao HUANG, Juanying XIE. Unsupervised spectral feature selection algorithms for high dimensional data[J]. Front. Comput. Sci., 2023, 17(5): 175330-.
[4]	Xulun YE, Jieyu ZHAO. Heterogeneous clustering via adversarial deep Bayesian generative model[J]. Front. Comput. Sci., 2023, 17(3): 173322-.
[5]	Momo MATSUDA, Yasunori FUTAMURA, Xiucai YE, Tetsuya SAKURAI. Distortion-free PCA on sample space for highly variable gene detection from single-cell RNA-seq data[J]. Front. Comput. Sci., 2023, 17(1): 171310-.
[6]	Arpita BISWAS, Abhishek MAJUMDAR, Soumyabrata DAS, Krishna Lal BAISHNAB. OCSO-CA: opposition based competitive swarm optimizer in energy efficient IoT clustering[J]. Front. Comput. Sci., 2022, 16(1): 161501-.
[7]	Suyu MEI. A framework combines supervised learning and dense subgraphs discovery to predict protein complexes[J]. Front. Comput. Sci., 2022, 16(1): 161901-.
[8]	Zheng HUO, Ping HE, Lisha HU, Huanyu ZHAO. DP-UserPro: differentially private user profile construction and publication[J]. Front. Comput. Sci., 2021, 15(5): 155811-.
[9]	Panthadeep BHATTACHARJEE, Pinaki MITRA. A survey of density based clustering algorithms[J]. Front. Comput. Sci., 2021, 15(1): 151308-.
[10]	Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data[J]. Front. Comput. Sci., 2020, 14(5): 145310-.
[11]	Xibin DONG, Zhiwen YU, Wenming CAO, Yifan SHI, Qianli MA. A survey on ensemble learning[J]. Front. Comput. Sci., 2020, 14(2): 241-258.
[12]	Ratha PECH, Dong HAO, Hong CHENG, Tao ZHOU. Enhancing subspace clustering based on dynamic prediction[J]. Front. Comput. Sci., 2019, 13(4): 802-812.
[13]	Hui XUE, Sen LI, Xiaohong CHEN, Yunyun WANG. A maximum margin clustering algorithm based on indefinite kernels[J]. Front. Comput. Sci., 2019, 13(4): 813-827.
[14]	Gaoqi HE, Qi CHEN, Dongxu JIANG, Yubo YUAN, Xingjian LU. Physical-barrier detection based collective motion analysis[J]. Front. Comput. Sci., 2019, 13(2): 426-436.
[15]	Alampallam Ramaswamy VASUDEVAN,Subramanian SELVAKUMAR. Local outlier factor and stronger one class classifier based hierarchical model for detection of attacks in network intrusion detection dataset[J]. Front. Comput. Sci., 2016, 10(4): 755-766.

Viewed

Full text

Abstract

Cited

Shared

Discussed