|
|
|
Validation indices for projective clustering |
| Lifei CHEN1,Shanjun HE2,Qingshan JIANG3, |
| 1.School of Mathematics
and Computer Science, Fujian Normal University, Fuzhou 350108, China; 2.Longyan Tobacco
Industrial Co., Ltd, Longyan 364010, China; 3.Chengdu University,
Chengdu 610106, China;Software School,
Xiamen University, Xiamen 361005, China; |
|
|
|
|
Abstract Cluster validation is a major issue in cluster analysis of data mining, which is the process of evaluating performance of clustering algorithms under varying input conditions. Many existing validity indices address clustering results of low-dimensional data. Within high-dimensional data, many of the dimensions are irrelevant, and the clusters usually only exist in some projected subspaces spanned by different combinations of dimensions. This paper presents a solution to the problem of cluster validation for projective clustering. We propose two new measurements for the intracluster compactness and intercluster separation of projected clusters. Based on these measurements and the conventional indices, three new cluster validity indices are presented. Combined with a fuzzy projective clustering algorithm, the new indices are used to determine the number of projected clusters in high-dimensional data. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real-world datasets.
|
| Keywords
data mining
cluster validation
projective clustering
cluster validity index
|
|
Issue Date: 05 December 2009
|
|
|
Berkhin P. Asurvey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, eds. Grouping Multidimensional Data: Recent Advances in Clustering. Berlin: Springer, 2006, 25―71
doi: 10.1007/3-540-28349-8_2
|
|
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90―105
doi: 10.1145/1007730.1007731
|
|
Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determiningthe number of custers. Pattern Recognition, 2004, 37(10): 2027―2037
doi: 10.1016/j.patcog.2004.03.012
|
|
Kim M, Yoo H, Ramakrishna R S. Cluster validation for high dimensionaldatasets. Proceeding of the AIMSA, 2004, 178―187
|
|
Halkidi M, Batistakis Y, Vazirgiannis M. Clustering validity checking methods: Part II. ACM SIGMOD Record Archive, 2002, 31(3): 19―27
doi: 10.1145/601858.601862
|
|
Bouguessa M, Wang S, Sun H. An objective approach to cluster validation. Pattern Recognition Letters, 2006, 27: 1419―1430
doi: 10.1016/j.patrec.2006.01.015
|
|
Pal N R, Bezdek J C. On cluster validity for thefuzzy C-means model. IEEE Transaction onFuzzy Systems, 1995, 3(3): 370―379
doi: 10.1109/91.413225
|
|
Xie X, Beni G. A validity measure for fuzzyclustering. IEEE Transactions on PatternAnalysis and Machine Intelligence, 1991, 13(8): 841―847
doi: 10.1109/34.85677
|
|
Patrikainen M, Meila M. Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902―916
doi: 10.1109/TKDE.2006.106
|
|
Chen L, Jiang Q, Wang S. A probabiliy model for projective clustering on highdimensional data. In: Proceedings of theIEEE ICDM, 2008, 755―760
|
|
Moise G, Sander J, Ester M. Robust projected clustering. knowledge lnformation System, 2008, 14(3): 273―298
|
|
Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. ACM SIGMOD Record. New York: ACM, 1999, 28(2): 61―72
|
|
Steinbach M, Ertöz L, Kumar V. The challenges of clustering high dimensional data. University of Mnnesota Supercomputing InstituteResearch Report, 2003, 213: 1―33
|
|
Aggarwal C C, Yu P S. Redefining clustering forhigh-dimensional applications. IEEE Transactionon Knowledge and Data Engineering, 2002, 14(2): 210―225
doi: 10.1109/69.991713
|
|
Domeniconi C, Gunopulos D, Ma S, et al. Locally adaptive metrics for clustering highdimensional data. Data Mining and KnowledgeDiscovery, 2007, 14(1): 63―98
doi: 10.1007/s10618-006-0060-8
|
|
Agarwal R K, Mustafa N H. k-means projective clustering. In: Proceedings of the PODS, 2004, 155―165
|
|
Chen L, Jiang Q, Wang S. Clusten valiation for subspace clustering on high dimensionaldata. In: Proceedings of APCCAS, 2008, 225―228
|
|
Bezdek J C. Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd., Boston, Ny, 1998 (Chapter F6)
|
|
Kriegel H P, Kröger P, Zimek A. Detecting clusters in moderate-tohigh dimensional data:Subspace Clustering, Pattern-based Clustering, and Correlation Clustering. Tutorial ICDM, 2007
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|