Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (2) : 357-381    https://doi.org/10.1007/s11704-016-6245-4
RESEARCH ARTICLE
CLS-Miner: efficient and effective closed high-utility itemset mining
Thu-Lan DAM1,2, Kenli LI1,3,4(), Philippe FOURNIER-VIGER5, Quang-Huy DUONG6
1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
2. Faculty of Information Technology, Hanoi University of Industry, Hanoi, Vietnam
3. CIC of HPC, National University of Defense Technology, Changsha 410073, China
4. National Supercomputing Center in Changsha, Changsha 410082, China
5. School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China
6. Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
 Download: PDF(963 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithmoutperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly.

Keywords utility mining      high-utility itemset mining      closed itemset mining      closed high-utility itemset mining     
Corresponding Author(s): Kenli LI   
Just Accepted Date: 14 December 2016   Online First Date: 02 April 2018    Issue Date: 08 April 2019
 Cite this article:   
Thu-Lan DAM,Kenli LI,Philippe FOURNIER-VIGER, et al. CLS-Miner: efficient and effective closed high-utility itemset mining[J]. Front. Comput. Sci., 2019, 13(2): 357-381.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-6245-4
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I2/357
1 RAgrawal, R Srikant. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499
2 M JZaki, KGouda. Fast vertical mining using diffsets. In: Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 326–335
https://doi.org/10.1145/956750.956788
3 J WHan, JPei, Y WYin. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87
https://doi.org/10.1023/B:DAMI.0000005258.31418.83
4 JHan, JWang, YLu, P Tzvetkov. Mining top-k frequent closed patterns without minimum support. In: Proceedings of IEEE International Conference on Data Mining. 2002, 211–218
5 GGrahne, JZhu. Fast algorithms for frequent itemset mining using fptrees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10): 1347–1362
https://doi.org/10.1109/TKDE.2005.166
6 Z HDeng, S LLv. PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications, 2015, 42(13): 5424–5432
https://doi.org/10.1016/j.eswa.2015.03.004
7 T LDam, KLi, PFournier-Viger, Q HDuong. An efficient algorithm for mining top-rank-k frequent patterns. Applied Intelligence, 2016, 45(1): 96–111
https://doi.org/10.1007/s10489-015-0748-9
8 YLiu, W KLiao, AChoudhary. A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005, 689–695
https://doi.org/10.1007/11430919_79
9 MLiu, JQu. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 55–64
https://doi.org/10.1145/2396761.2396773
10 VTseng, B EShie, C WWu, P Yu. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772–1786
https://doi.org/10.1109/TKDE.2012.59
11 PFournier-Viger, C W Wu, SZida, VTseng. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of International Symposium on Methodologies for Intelligent Systems. 2014, 83–92
https://doi.org/10.1007/978-3-319-08326-1_9
12 WSong, YLiu, JLi. BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining, 2014, 10(1): 1–15
https://doi.org/10.4018/ijdwm.2014010101
13 WSong, ZZhang, JLi. A high utility itemset mining algorithm based on subsume index. Knowledge and Information Systems, 2016, 49(1): 315–340
14 Q HDuong, BLiao, PFournier-Viger, T LDam. An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowledge-Based Systems, 2016, 104: 106–122
https://doi.org/10.1016/j.knosys.2016.04.016
15 CAhmed, S Tanbeer, B SJeong, Y KLee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708–1721
https://doi.org/10.1109/TKDE.2009.46
16 QYang. Three challenges in data mining. Frontiers of Computer Science, 2010, 4(3): 324–333
https://doi.org/10.1007/s11704-010-0102-7
17 JChen, KLi, ZTang, K Bilal, SYu, CWeng, KLi. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919–933
https://doi.org/10.1109/TPDS.2016.2603511
18 WSong, YLiu, JLi. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29–43
https://doi.org/10.1007/s10489-013-0443-7
19 PFournier-Viger, J C W Lin, Q HDuong, T LDam. In: Fujita H, Ali M, Selamat A, et al, eds. FHM+: Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction. Cham: Springer International 2016, 115–127
20 V STseng, C WWu, PFournier-Viger, P SYu. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 726–739
https://doi.org/10.1109/TKDE.2014.2345377
21 NPasquier, Y Bastide, RTaouil, LLakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 1999, 24(1): 25–46
https://doi.org/10.1016/S0306-4379(99)00003-4
22 CLucchese, S Orlando, RPerego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 21–36
https://doi.org/10.1109/TKDE.2006.10
23 M JZaki, C JHsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 462–478
https://doi.org/10.1109/TKDE.2005.60
24 PFournier-Viger. FHN: efficient mining of high-utility itemsets with negative unit profits. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 16–29
https://doi.org/10.1007/978-3-319-14717-8_2
25 VTseng, C WWu, PFournier-Viger, PYu. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 54–67
https://doi.org/10.1109/TKDE.2015.2458860
26 C WWu, P Fournier-Viger, J YGu, V STseng. Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence. 2015, 187–194
https://doi.org/10.1109/TAAI.2015.7407089
27 RChan, QYang, Y DShen. Mining high utility itemsets. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 19–26
https://doi.org/10.1109/ICDM.2003.1250893
28 G CLan, T PHong, V STseng. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85–107
https://doi.org/10.1007/s10115-012-0492-y
29 KGouda, M JZaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 2005, 11(3): 223–242
https://doi.org/10.1007/s10618-005-0002-x
30 TUno, MKiyomi, HArimura. LCMver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2004
31 LSzathmary, P Valtchev, ANapoli, RGodin, ABoc, VMakarenkov. A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Annals of Mathematics and Artificial Intelligence, 2014, 70(1–2): 81–105
https://doi.org/10.1007/s10472-013-9372-8
32 PFournier-Viger, C W Wu, V STseng. Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the International Conference on Advanced Data Mining and Applications. 2014, 30–43
https://doi.org/10.1007/978-3-319-14717-8_3
33 B EShie, S YPhilip, V STseng. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 2012, 39(17): 12947–12960
https://doi.org/10.1016/j.eswa.2012.05.035
34 NPasquier, Y Bastide, RTaouil, LLakhal. Discovering frequent closed itemsets for association rules. In: Proceedings of the International Conference on Database Theory. 1999, 398–416
https://doi.org/10.1007/3-540-49257-7_25
35 PFournier-Viger, A Gomariz, TGueniche, ASoltani, C WWu, V STseng. SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research, 2014, 15: 3569–3573
36 JPisharath, YLiu, W KLiao, A Choudhary, GMemik, JParhi. NUMineBench 2.0. CUCIS Technical Report CUCIS-2005-08-01. 2005
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed