|
|
CLS-Miner: efficient and effective closed high-utility itemset mining |
Thu-Lan DAM1,2, Kenli LI1,3,4(), Philippe FOURNIER-VIGER5, Quang-Huy DUONG6 |
1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China 2. Faculty of Information Technology, Hanoi University of Industry, Hanoi, Vietnam 3. CIC of HPC, National University of Defense Technology, Changsha 410073, China 4. National Supercomputing Center in Changsha, Changsha 410082, China 5. School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China 6. Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway |
|
|
Abstract High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithmoutperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly.
|
Keywords
utility mining
high-utility itemset mining
closed itemset mining
closed high-utility itemset mining
|
Corresponding Author(s):
Kenli LI
|
Just Accepted Date: 14 December 2016
Online First Date: 02 April 2018
Issue Date: 08 April 2019
|
|
1 |
RAgrawal, R Srikant. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499
|
2 |
M JZaki, KGouda. Fast vertical mining using diffsets. In: Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 326–335
https://doi.org/10.1145/956750.956788
|
3 |
J WHan, JPei, Y WYin. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87
https://doi.org/10.1023/B:DAMI.0000005258.31418.83
|
4 |
JHan, JWang, YLu, P Tzvetkov. Mining top-k frequent closed patterns without minimum support. In: Proceedings of IEEE International Conference on Data Mining. 2002, 211–218
|
5 |
GGrahne, JZhu. Fast algorithms for frequent itemset mining using fptrees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10): 1347–1362
https://doi.org/10.1109/TKDE.2005.166
|
6 |
Z HDeng, S LLv. PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications, 2015, 42(13): 5424–5432
https://doi.org/10.1016/j.eswa.2015.03.004
|
7 |
T LDam, KLi, PFournier-Viger, Q HDuong. An efficient algorithm for mining top-rank-k frequent patterns. Applied Intelligence, 2016, 45(1): 96–111
https://doi.org/10.1007/s10489-015-0748-9
|
8 |
YLiu, W KLiao, AChoudhary. A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005, 689–695
https://doi.org/10.1007/11430919_79
|
9 |
MLiu, JQu. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 55–64
https://doi.org/10.1145/2396761.2396773
|
10 |
VTseng, B EShie, C WWu, P Yu. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772–1786
https://doi.org/10.1109/TKDE.2012.59
|
11 |
PFournier-Viger, C W Wu, SZida, VTseng. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of International Symposium on Methodologies for Intelligent Systems. 2014, 83–92
https://doi.org/10.1007/978-3-319-08326-1_9
|
12 |
WSong, YLiu, JLi. BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining, 2014, 10(1): 1–15
https://doi.org/10.4018/ijdwm.2014010101
|
13 |
WSong, ZZhang, JLi. A high utility itemset mining algorithm based on subsume index. Knowledge and Information Systems, 2016, 49(1): 315–340
|
14 |
Q HDuong, BLiao, PFournier-Viger, T LDam. An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowledge-Based Systems, 2016, 104: 106–122
https://doi.org/10.1016/j.knosys.2016.04.016
|
15 |
CAhmed, S Tanbeer, B SJeong, Y KLee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708–1721
https://doi.org/10.1109/TKDE.2009.46
|
16 |
QYang. Three challenges in data mining. Frontiers of Computer Science, 2010, 4(3): 324–333
https://doi.org/10.1007/s11704-010-0102-7
|
17 |
JChen, KLi, ZTang, K Bilal, SYu, CWeng, KLi. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919–933
https://doi.org/10.1109/TPDS.2016.2603511
|
18 |
WSong, YLiu, JLi. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29–43
https://doi.org/10.1007/s10489-013-0443-7
|
19 |
PFournier-Viger, J C W Lin, Q HDuong, T LDam. In: Fujita H, Ali M, Selamat A, et al, eds. FHM+: Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction. Cham: Springer International 2016, 115–127
|
20 |
V STseng, C WWu, PFournier-Viger, P SYu. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 726–739
https://doi.org/10.1109/TKDE.2014.2345377
|
21 |
NPasquier, Y Bastide, RTaouil, LLakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 1999, 24(1): 25–46
https://doi.org/10.1016/S0306-4379(99)00003-4
|
22 |
CLucchese, S Orlando, RPerego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 21–36
https://doi.org/10.1109/TKDE.2006.10
|
23 |
M JZaki, C JHsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 462–478
https://doi.org/10.1109/TKDE.2005.60
|
24 |
PFournier-Viger. FHN: efficient mining of high-utility itemsets with negative unit profits. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 16–29
https://doi.org/10.1007/978-3-319-14717-8_2
|
25 |
VTseng, C WWu, PFournier-Viger, PYu. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 54–67
https://doi.org/10.1109/TKDE.2015.2458860
|
26 |
C WWu, P Fournier-Viger, J YGu, V STseng. Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence. 2015, 187–194
https://doi.org/10.1109/TAAI.2015.7407089
|
27 |
RChan, QYang, Y DShen. Mining high utility itemsets. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 19–26
https://doi.org/10.1109/ICDM.2003.1250893
|
28 |
G CLan, T PHong, V STseng. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85–107
https://doi.org/10.1007/s10115-012-0492-y
|
29 |
KGouda, M JZaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 2005, 11(3): 223–242
https://doi.org/10.1007/s10618-005-0002-x
|
30 |
TUno, MKiyomi, HArimura. LCMver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2004
|
31 |
LSzathmary, P Valtchev, ANapoli, RGodin, ABoc, VMakarenkov. A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Annals of Mathematics and Artificial Intelligence, 2014, 70(1–2): 81–105
https://doi.org/10.1007/s10472-013-9372-8
|
32 |
PFournier-Viger, C W Wu, V STseng. Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the International Conference on Advanced Data Mining and Applications. 2014, 30–43
https://doi.org/10.1007/978-3-319-14717-8_3
|
33 |
B EShie, S YPhilip, V STseng. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 2012, 39(17): 12947–12960
https://doi.org/10.1016/j.eswa.2012.05.035
|
34 |
NPasquier, Y Bastide, RTaouil, LLakhal. Discovering frequent closed itemsets for association rules. In: Proceedings of the International Conference on Database Theory. 1999, 398–416
https://doi.org/10.1007/3-540-49257-7_25
|
35 |
PFournier-Viger, A Gomariz, TGueniche, ASoltani, C WWu, V STseng. SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research, 2014, 15: 3569–3573
|
36 |
JPisharath, YLiu, W KLiao, A Choudhary, GMemik, JParhi. NUMineBench 2.0. CUCIS Technical Report CUCIS-2005-08-01. 2005
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|