Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2010, Vol. 4 Issue (1) : 89-99    https://doi.org/10.1007/s11704-009-0072-9
Research articles
Mining non-redundant diverse patterns: an information theoretic perspective
Chaofeng SHA1,Jian GONG1,Aoying ZHOU2,
1.School of Computer Science, Fudan University, Shanghai 200433, China; 2.Software Engineering Institute, East China Normal University, Shanghai 200062, China;
 Download: PDF(284 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns.
Keywords diverse pattern      entropy      depth-first search      
Issue Date: 05 March 2010
 Cite this article:   
Chaofeng SHA,Jian GONG,Aoying ZHOU. Mining non-redundant diverse patterns: an information theoretic perspective[J]. Front. Comput. Sci., 2010, 4(1): 89-99.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-009-0072-9
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I1/89
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in largedatabases. In: Proceedings of SIGMOD’93. 1993, 207―216
Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rulesto correlations. In: Proceedings of SIGMOD’97. 1997, 265―276
Pan F, Roberts A, McMillan L, et al. Sample selection for maximal diversity. In: Proceedings of ICDM’07. 2007, 262―271
Cheng H, Yan X, Han J, et al. Discriminative frequent pattern analyis foreffective classification. In: Proceedingsof ICDE’07. 2007, 716―725
Zhang X, Pan F, Wang W, et al. Mining non-redundant high order correlationin binary data. In: Proceedings of VLDB’08. 2008, 1178–1188
Pardo L. StatisticalInference Based on Divergence Measures. Chapman-Hall/CRC, 2005
Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity:privacy beyond k-anonymity. In: Proceedings of ICDE’06. 2006, 24―35
Agrawal R, Srikant R. Fast algorithms for miningassociation rules in large databases. In: Proceedings of VLDB’94. 1994, 487―499
Omiecinski E R. Alternative interest measures for mining associations in databases. IEEE Transactions on Data Engineering, 2003, 15(1): 57―69
Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoreticapproach. In: Proceedings of KDD’06. 2006, 227―236
Knobbe A, Ho E. Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06. 2006, 237―244
Heikinheimo H, Hinkkanen E, Mannila H, et al. Finding low-entropy sets and trees from binarydata. In: Proceedings of KDD’07. 2007, 350―359
Pan F, Wang W, Tung A K H, et al. Finding representative set from massive data. In: Proceedings of ICDM’05. 2005, 338―345
Guyon I, Elisseeff A. An introduction to variableand feature selection. Journal of MachineLearning Research, 2003, 1157―1182
Koller D, Sahami M. Toward optimal feature selection. In: Proceedings of ICML’96. 1996, 284―292
Cover T, Thomas J. Elements of Information Theory. Wiley Interscience, 1991
Yeung R W. A First Course in Information Theory. Springer, 2002
Han T S. Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr., 1978, 36: 133―156
Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 2nd ed. MA: MIT Press, 2001
Witten I H, Frank E. Data Mining: Practical MachineLearning Tools and Tech-niques. 2nd ed. San Francisco: Morgan Kaufmann, 2005
[1] Je Sen TEH, Weijian TENG, Azman SAMSUDIN, Jiageng CHEN. A post-processing method for true random number generators based on hyperchaos with applications in audio-based generators[J]. Front. Comput. Sci., 2020, 14(6): 146405-.
[2] Rizwan Ahmed KHAN, Alexandre MEYER, Hubert KONIK, Saida BOUAKAZ. Saliency-based framework for facial expression recognition[J]. Front. Comput. Sci., 2019, 13(1): 183-198.
[3] Sudipta ROY, Debnath BHATTACHARYYA, Samir Kumar BANDYOPADHYAY, Tai-Hoon KIM. An improved brain MR image binarization method as a preprocessing for abnormality detection and features extraction[J]. Front. Comput. Sci., 2017, 11(4): 717-727.
[4] Yun SONG,Zhihui LI,Yongming LI,Ren XIN. The optimal information rate for graph access structures of nine participants[J]. Front. Comput. Sci., 2015, 9(5): 778-787.
[5] Yongjian ZHAO, Hong HE, Jianxun Mi. Noisy component extraction with reference[J]. Front Comput Sci, 2013, 7(1): 135-144.
[6] Eryun LIU, Heng ZHAO, Fangfei GUO, Jimin LIANG, Jie TIAN. Fingerprint segmentation based on an AdaBoost classifier[J]. Front Comput Sci Chin, 2011, 5(2): 148-157.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed