Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2009, Vol. 3 Issue (4) : 477-484    https://doi.org/10.1007/s11704-009-0051-1
Research articles
Validation indices for projective clustering
Lifei CHEN1,Shanjun HE2,Qingshan JIANG3,
1.School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350108, China; 2.Longyan Tobacco Industrial Co., Ltd, Longyan 364010, China; 3.Chengdu University, Chengdu 610106, China;Software School, Xiamen University, Xiamen 361005, China;
 Download: PDF(475 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract Cluster validation is a major issue in cluster analysis of data mining, which is the process of evaluating performance of clustering algorithms under varying input conditions. Many existing validity indices address clustering results of low-dimensional data. Within high-dimensional data, many of the dimensions are irrelevant, and the clusters usually only exist in some projected subspaces spanned by different combinations of dimensions. This paper presents a solution to the problem of cluster validation for projective clustering. We propose two new measurements for the intracluster compactness and intercluster separation of projected clusters. Based on these measurements and the conventional indices, three new cluster validity indices are presented. Combined with a fuzzy projective clustering algorithm, the new indices are used to determine the number of projected clusters in high-dimensional data. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real-world datasets.
Keywords data mining      cluster validation      projective clustering      cluster validity index      
Issue Date: 05 December 2009
 Cite this article:   
Lifei CHEN,Shanjun HE,Qingshan JIANG. Validation indices for projective clustering[J]. Front. Comput. Sci., 2009, 3(4): 477-484.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-009-0051-1
https://academic.hep.com.cn/fcs/EN/Y2009/V3/I4/477
Berkhin P. Asurvey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, eds. Grouping Multidimensional Data: Recent Advances in Clustering. Berlin: Springer, 2006, 25―71

doi: 10.1007/3-540-28349-8_2
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90―105

doi: 10.1145/1007730.1007731
Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determiningthe number of custers. Pattern Recognition, 2004, 37(10): 2027―2037

doi: 10.1016/j.patcog.2004.03.012
Kim M, Yoo H, Ramakrishna R S. Cluster validation for high dimensionaldatasets. Proceeding of the AIMSA, 2004, 178―187
Halkidi M, Batistakis Y, Vazirgiannis M. Clustering validity checking methods: Part II. ACM SIGMOD Record Archive, 2002, 31(3): 19―27

doi: 10.1145/601858.601862
Bouguessa M, Wang S, Sun H. An objective approach to cluster validation. Pattern Recognition Letters, 2006, 27: 1419―1430

doi: 10.1016/j.patrec.2006.01.015
Pal N R, Bezdek J C. On cluster validity for thefuzzy C-means model. IEEE Transaction onFuzzy Systems, 1995, 3(3): 370―379

doi: 10.1109/91.413225
Xie X, Beni G. A validity measure for fuzzyclustering. IEEE Transactions on PatternAnalysis and Machine Intelligence, 1991, 13(8): 841―847

doi: 10.1109/34.85677
Patrikainen M, Meila M. Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902―916

doi: 10.1109/TKDE.2006.106
Chen L, Jiang Q, Wang S. A probabiliy model for projective clustering on highdimensional data. In: Proceedings of theIEEE ICDM, 2008, 755―760
Moise G, Sander J, Ester M. Robust projected clustering. knowledge lnformation System, 2008, 14(3): 273―298
Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. ACM SIGMOD Record. New York: ACM, 1999, 28(2): 61―72
Steinbach M, Ertöz L, Kumar V. The challenges of clustering high dimensional data. University of Mnnesota Supercomputing InstituteResearch Report, 2003, 213: 1―33
Aggarwal C C, Yu P S. Redefining clustering forhigh-dimensional applications. IEEE Transactionon Knowledge and Data Engineering, 2002, 14(2): 210―225

doi: 10.1109/69.991713
Domeniconi C, Gunopulos D, Ma S, et al. Locally adaptive metrics for clustering highdimensional data. Data Mining and KnowledgeDiscovery, 2007, 14(1): 63―98

doi: 10.1007/s10618-006-0060-8
Agarwal R K, Mustafa N H. k-means projective clustering. In: Proceedings of the PODS, 2004, 155―165
Chen L, Jiang Q, Wang S. Clusten valiation for subspace clustering on high dimensionaldata. In: Proceedings of APCCAS, 2008, 225―228
Bezdek J C. Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd., Boston, Ny, 1998 (Chapter F6)
Kriegel H P, Kröger P, Zimek A. Detecting clusters in moderate-tohigh dimensional data:Subspace Clustering, Pattern-based Clustering, and Correlation Clustering. Tutorial ICDM, 2007
[1] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[2] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[3] Guijuan ZHANG, Yang LIU, Xiaoning JIN. A survey of autoencoder-based recommender systems[J]. Front. Comput. Sci., 2020, 14(2): 430-450.
[4] Lu LIU, Shang WANG. Meta-path-based outlier detection in heterogeneous information network[J]. Front. Comput. Sci., 2020, 14(2): 388-403.
[5] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[6] Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[7] Shuaiqiang WANG, Yilong YIN. Polygene-based evolutionary algorithms with frequent pattern mining[J]. Front. Comput. Sci., 2018, 12(5): 950-965.
[8] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[9] Yuan LI, Yuhai ZHAO, Guoren WANG, Xiaofeng ZHU, Xiang ZHANG, Zhanghui WANG, Jun PANG. Finding susceptible and protective interaction patterns in large-scale genetic association study[J]. Front. Comput. Sci., 2017, 11(3): 541-554.
[10] Junhua LU,Wei CHEN,Yuxin MA,Junming KE,Zongzhuang LI,Fan ZHANG,Ross MACIEJEWSKI. Recent progress and trends in predictive visual analytics[J]. Front. Comput. Sci., 2017, 11(2): 192-207.
[11] Chengliang WANG,Yayun PENG,Debraj DE,Wen-Zhan SONG. DPHK: real-time distributed predicted data collecting based on activity pattern knowledge mined from trajectories in smart environments[J]. Front. Comput. Sci., 2016, 10(6): 1000-1011.
[12] Xin XU,Wei WANG,Jianhong WANG. A three-way incremental-learning algorithm for radar emitter identification[J]. Front. Comput. Sci., 2016, 10(4): 673-688.
[13] Wenmei LIU,Hui LIU. Major motivations for extract method refactorings: analysis based on interviews and change histories[J]. Front. Comput. Sci., 2016, 10(4): 644-656.
[14] Yaobin HE, Haoyu TAN, Wuman LUO, Shengzhong FENG, Jianping FAN. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data[J]. Front. Comput. Sci., 2014, 8(1): 83-99.
[15] Fabian GIESEKE, Gabriel MORUZ, Jan VAHRENHOLD. Resilient k-d trees: k-means in space revisited[J]. Front Comput Sci, 2012, 6(2): 166-178.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed