Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2014, Vol. 8 Issue (4) : 563-580    https://doi.org/10.1007/s11704-014-3105-y
RESEARCH ARTICLE
Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams
Min XU1,2,Hisao ISHIBUCHI3,Xin GU1,Shitong WANG1,*()
1. School of Digital Media, Jiangnan University,Wuxi 214122, China
2. School of Internet of Things Technology,Wuxi Institute of Technology,Wuxi 214121, China
3. Department of Computer Science, Osaka Prefecture University, Osaka 599-8531, Japan
 Download: PDF(910 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

In many data stream mining applications, traditional density estimation methods such as kernel density estimation, reduced set density estimation can not be applied to the density estimation of data streams because of their high computational burden, processing time and intensive memory allocation requirement. In order to reduce the time and space complexity, a novel density estimation method Dm-KDE over data streams based on the proposed algorithm m-KDE which can be used to design a KDE estimator with the fixed number of kernel components for a dataset is proposed. In this method, Dm-KDE sequence entries are created by algorithm m-KDE instead of all kernels obtained from other density estimation methods. In order to further reduce the storage space, Dm-KDE sequence entries can be merged by calculating their KL divergences. Finally, the probability density functions over arbitrary time or entire time can be estimated through the obtained estimation model. In contrast to the state-of-the-art algorithm SOMKE, the distinctive advantage of the proposed algorithm Dm-KDE exists in that it can achieve the same accuracy with much less fixed number of kernel components such that it is suitable for the scenarios where higher on-line computation about the kernel density estimation over data streams is required.We compare Dm-KDE with SOMKE and M-kernel in terms of density estimation accuracy and running time for various stationary datasets. We also apply Dm-KDE to evolving data streams. Experimental results illustrate the effectiveness of the proposed method.

Keywords kernel density estimation      Kullback–Leibler divergence      data streams      kernel width      time and space complexity     
Corresponding Author(s): Shitong WANG   
Issue Date: 11 August 2014
 Cite this article:   
Min XU,Hisao ISHIBUCHI,Xin GU, et al. Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams[J]. Front. Comput. Sci., 2014, 8(4): 563-580.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-014-3105-y
https://academic.hep.com.cn/fcs/EN/Y2014/V8/I4/563
1 Domingos P, Hulten G. A general framework for mining massive data stream. Journal of Computational and Graphical Statistics, 2003, 12(4): 945-949
doi: 10.1198/1061860032544
2 Aggarwal C C, Han J, Wang J Y, Yu P S. A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(5): 577-589
doi: 10.1109/TKDE.2006.69
3 Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Transaction on Neural Networks, 2011, 22(10): 1517-1531
doi: 10.1109/TNN.2011.2160459
4 Lazar A A, Pnevatikakis E A. Video time encoding machines. IEEE Transaction on Neural Networks, 2011, 22(3): 461-473
doi: 10.1109/TNN.2010.2103323
5 Rogister P, Benosman R, Ieng S H, Lichtsteiner P, Delbruck T. Asynchronous event-based binocular stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(2): 347-353
doi: 10.1109/TNNLS.2011.2180025
6 Domingos P, Hulten G. Catching up with the data: research issues in mining data streams. In: Workshop on Research Issues in Data Mining and Knowledge Discovery. 2001, 1-5
7 Martinez W L, Martinez A R. Computational statistics handbook with MATLAB. London: Chapman & Hall, 2008
8 Heinz C, Seeger B. Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(7): 880-893
doi: 10.1109/TKDE.2008.21
9 H?m?l?inen A. Self-organizing map and reduced kernel density estimation. PhD Thesis. Jyv?skyl?: University of Jyv?skyl?, 1995.
10 Girolami M, He C. Probability density estimation from optimally condensed data samples. Pattern Analysis and Machine Intelligence, 2003, 25(10): 1253-1264
doi: 10.1109/TPAMI.2003.1233899
11 Deng Z H, Chung F L, Wang S T. FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recognition, 2008, 41(4): 1363-1372
doi: 10.1016/j.patcog.2007.09.013
12 Deng Z H, Kup-Sze C, Chung F L, Wang S T. Scalable TSK fuzzy modeling for very large datasets using minimal-enclosing-ball approximation. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 210-226
doi: 10.1109/TFUZZ.2010.2091961
13 Qian P J, Chung F L, Wang S T, Deng Z H. Fast graph-based relaxed clustering for large data sets using minimal enclosing ball. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(3): 672-687
doi: 10.1109/TSMCB.2011.2172604
14 Cai Z, Qian W, Wei L, Zhou A. M-kernel merging: toward density estimation over data streams. In: Proceedings of the 18th International Conference on Database System Advances. 2003, 285-292
15 Heinz C, Seeger B. Toward kernel density estimation over streaming data. In: Proceedings of International Conference on Management Data. 2006, 1-12
16 Cao Y, He H B, Man H. SOMKE: kernel density estimation over data streams by sequences of self-organizing maps. IEEE Transaction on Neural Network and Learning Systems, 2012, 23(8): 1254-1268
doi: 10.1109/TNNLS.2012.2201167
17 Kollios G, Gunopulos D. Efficient biased sampling for approximatie clustering and outlier detection in large datasets. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(5): 1170-1187
doi: 10.1109/TKDE.2003.1232271
18 He H, Chen S, Li K, Xu X. Incremental learning from stream data. IEEE Transaction on Neural Network, 2011, 22(12): 1901-1914
doi: 10.1109/TNN.2011.2171713
19 Kullback S, Leibler R A. On information and sufficiency. The Annals of Mathematical Statistics, 1951, 22(1): 79-86
doi: 10.1214/aoms/1177729694
20 Cao Y, He H, Man H, Shen X. Integration of self-organizing map (SOM) and kernel density estimation (KDE) for network intrusion detection. In Proceedings of SPIE, 2009, 1-12
21 Yang Y, Liu Y Y. An improved background and foreground modeling using kernel density estimation in moving object detection. In: Proceedings of the International Conference on Computer Science and Network Technology (ICCSNT). 2011, 1050-1054
22 DiNardo J, Tobias J L. Nonparametric density and regression estimation. The Journal of Economic Perspectives, 2001, 15(4): 11-28
doi: 10.1257/jep.15.4.11
23 Parzen E. On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 1962, 33(3): 1065-1076
doi: 10.1214/aoms/1177704472
24 Silverman B W. Density estimation for statistics and data analysis. London: <PublisherName Language="chs">Chapma<?Pub Caret?>n & Hall</PublisherName>, 1986
doi: 10.1007/978-1-4899-3324-9
25 Jones M C, Marron J S, Sheather S J. A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 1996, 91(433): 401-407
doi: 10.1080/01621459.1996.10476701
26 Raykar V C, Duraiswami R. Fast optimal bandwidth selection for kernel density estimation. In: Proceedings of 6th SIAMInternational Conference on Data Mining. 2006, 524-528
27 Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient K-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881-892
doi: 10.1109/TPAMI.2002.1017616
[1] Yuwei WANG, Yuanchun ZHOU, Ying LIU, Ze LUO, Danhuai GUO, Jing SHAO, Fei TAN, Liang WU, Jianhui LI, Baoping YAN. A grid-based clustering algorithm for wild bird distribution[J]. Front Comput Sci, 2013, 7(4): 475-485.
[2] Robert GWADERA. Multi-stream join answering for mining significant cross-stream correlations[J]. Front Comput Sci, 2012, 6(2): 131-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed