Mining non-redundant diverse patterns: an information
theoretic perspective

doi:10.1007/s11704-009-0072-9

Front. Comput. Sci.

2010, Vol. 4

Issue (1) : 89-99 https://doi.org/10.1007/s11704-009-0072-9

Research articles

Mining non-redundant diverse patterns: an information theoretic perspective

Chaofeng SHA¹,Jian GONG¹,Aoying ZHOU²,

1.School of Computer Science, Fudan University, Shanghai 200433, China; 2.Software Engineering Institute, East China Normal University, Shanghai 200062, China;

Download: PDF(284 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns.

Keywords diverse pattern entropy depth-first search

Issue Date: 05 March 2010

Cite this article:

Chaofeng SHA,Jian GONG,Aoying ZHOU. Mining non-redundant diverse patterns: an information theoretic perspective[J]. Front. Comput. Sci., 2010, 4(1): 89-99.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-009-0072-9
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I1/89

	Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in largedatabases. In: Proceedings of SIGMOD’93. 1993, 207―216
	Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rulesto correlations. In: Proceedings of SIGMOD’97. 1997, 265―276
	Pan F, Roberts A, McMillan L, et al. Sample selection for maximal diversity. In: Proceedings of ICDM’07. 2007, 262―271
	Cheng H, Yan X, Han J, et al. Discriminative frequent pattern analyis foreffective classification. In: Proceedingsof ICDE’07. 2007, 716―725
	Zhang X, Pan F, Wang W, et al. Mining non-redundant high order correlationin binary data. In: Proceedings of VLDB’08. 2008, 1178–1188
	Pardo L. StatisticalInference Based on Divergence Measures. Chapman-Hall/CRC, 2005
	Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity:privacy beyond k-anonymity. In: Proceedings of ICDE’06. 2006, 24―35
	Agrawal R, Srikant R. Fast algorithms for miningassociation rules in large databases. In: Proceedings of VLDB’94. 1994, 487―499
	Omiecinski E R. Alternative interest measures for mining associations in databases. IEEE Transactions on Data Engineering, 2003, 15(1): 57―69
	Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoreticapproach. In: Proceedings of KDD’06. 2006, 227―236
	Knobbe A, Ho E. Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06. 2006, 237―244
	Heikinheimo H, Hinkkanen E, Mannila H, et al. Finding low-entropy sets and trees from binarydata. In: Proceedings of KDD’07. 2007, 350―359
	Pan F, Wang W, Tung A K H, et al. Finding representative set from massive data. In: Proceedings of ICDM’05. 2005, 338―345
	Guyon I, Elisseeff A. An introduction to variableand feature selection. Journal of MachineLearning Research, 2003, 1157―1182
	Koller D, Sahami M. Toward optimal feature selection. In: Proceedings of ICML’96. 1996, 284―292
	Cover T, Thomas J. Elements of Information Theory. Wiley Interscience, 1991
	Yeung R W. A First Course in Information Theory. Springer, 2002
	Han T S. Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr., 1978, 36: 133―156
	Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 2nd ed. MA: MIT Press, 2001
	Witten I H, Frank E. Data Mining: Practical MachineLearning Tools and Tech-niques. 2nd ed. San Francisco: Morgan Kaufmann, 2005

[1]	Je Sen TEH, Weijian TENG, Azman SAMSUDIN, Jiageng CHEN. A post-processing method for true random number generators based on hyperchaos with applications in audio-based generators[J]. Front. Comput. Sci., 2020, 14(6): 146405-.
[2]	Rizwan Ahmed KHAN, Alexandre MEYER, Hubert KONIK, Saida BOUAKAZ. Saliency-based framework for facial expression recognition[J]. Front. Comput. Sci., 2019, 13(1): 183-198.
[3]	Sudipta ROY, Debnath BHATTACHARYYA, Samir Kumar BANDYOPADHYAY, Tai-Hoon KIM. An improved brain MR image binarization method as a preprocessing for abnormality detection and features extraction[J]. Front. Comput. Sci., 2017, 11(4): 717-727.
[4]	Yun SONG,Zhihui LI,Yongming LI,Ren XIN. The optimal information rate for graph access structures of nine participants[J]. Front. Comput. Sci., 2015, 9(5): 778-787.
[5]	Yongjian ZHAO, Hong HE, Jianxun Mi. Noisy component extraction with reference[J]. Front Comput Sci, 2013, 7(1): 135-144.
[6]	Eryun LIU, Heng ZHAO, Fangfei GUO, Jimin LIANG, Jie TIAN. Fingerprint segmentation based on an AdaBoost classifier[J]. Front Comput Sci Chin, 2011, 5(2): 148-157.

Viewed

Full text

Abstract

Cited

Shared

Discussed