|
|
Mining non-redundant diverse patterns: an information
theoretic perspective |
Chaofeng SHA1,Jian GONG1,Aoying ZHOU2, |
1.School of Computer Science,
Fudan University, Shanghai 200433, China; 2.Software Engineering
Institute, East China Normal University, Shanghai 200062, China; |
|
|
Abstract The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns.
|
Keywords
diverse pattern
entropy
depth-first search
|
Issue Date: 05 March 2010
|
|
|
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in largedatabases. In: Proceedings of SIGMOD’93. 1993, 207―216
|
|
Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rulesto correlations. In: Proceedings of SIGMOD’97. 1997, 265―276
|
|
Pan F, Roberts A, McMillan L, et al. Sample selection for maximal diversity. In: Proceedings of ICDM’07. 2007, 262―271
|
|
Cheng H, Yan X, Han J, et al. Discriminative frequent pattern analyis foreffective classification. In: Proceedingsof ICDE’07. 2007, 716―725
|
|
Zhang X, Pan F, Wang W, et al. Mining non-redundant high order correlationin binary data. In: Proceedings of VLDB’08. 2008, 1178–1188
|
|
Pardo L. StatisticalInference Based on Divergence Measures. Chapman-Hall/CRC, 2005
|
|
Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity:privacy beyond k-anonymity. In: Proceedings of ICDE’06. 2006, 24―35
|
|
Agrawal R, Srikant R. Fast algorithms for miningassociation rules in large databases. In: Proceedings of VLDB’94. 1994, 487―499
|
|
Omiecinski E R. Alternative interest measures for mining associations in databases. IEEE Transactions on Data Engineering, 2003, 15(1): 57―69
|
|
Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoreticapproach. In: Proceedings of KDD’06. 2006, 227―236
|
|
Knobbe A, Ho E. Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06. 2006, 237―244
|
|
Heikinheimo H, Hinkkanen E, Mannila H, et al. Finding low-entropy sets and trees from binarydata. In: Proceedings of KDD’07. 2007, 350―359
|
|
Pan F, Wang W, Tung A K H, et al. Finding representative set from massive data. In: Proceedings of ICDM’05. 2005, 338―345
|
|
Guyon I, Elisseeff A. An introduction to variableand feature selection. Journal of MachineLearning Research, 2003, 1157―1182
|
|
Koller D, Sahami M. Toward optimal feature selection. In: Proceedings of ICML’96. 1996, 284―292
|
|
Cover T, Thomas J. Elements of Information Theory. Wiley Interscience, 1991
|
|
Yeung R W. A First Course in Information Theory. Springer, 2002
|
|
Han T S. Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr., 1978, 36: 133―156
|
|
Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 2nd ed. MA: MIT Press, 2001
|
|
Witten I H, Frank E. Data Mining: Practical MachineLearning Tools and Tech-niques. 2nd ed. San Francisco: Morgan Kaufmann, 2005
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|