|
|
Fast correlation coefficient estimation algorithm for HBase-based massive time series data |
Wen LIU1,2, Tuqian ZHANG2, Yanming SHEN2( ), Peng WANG3 |
1. Department of Electrical and Information Engineering, Xinjiang Institute of Engineering, Urumqi 830091, China 2. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China 3. School of Computer Science, Fudan University, Shanghai 201203, China |
|
|
Abstract In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurementmethod, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.
|
Keywords
time series
HBase
correlation coefficient
fast estimation
|
Corresponding Author(s):
Yanming SHEN
|
Just Accepted Date: 04 September 2017
Online First Date: 07 September 2018
Issue Date: 29 May 2019
|
|
1 |
AMueen, SNath, JLiu. Fast approximate correlation for massive timeseries data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 171–182
https://doi.org/10.1145/1807167.1807188
|
2 |
Y FTao, D Papadias, CFaloutsos. Approximate temporal aggregation. In: Proceedings of the 20th IEEE International Conference on Data Engineering. 2004, 190–201
https://doi.org/10.1109/ICDE.2004.1319996
|
3 |
Y FTao, KYi, CSheng, J Pei, F FLi. Logging every footstep: quantile summaries for the entire history. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 639–650
https://doi.org/10.1145/1807167.1807237
|
4 |
PEsling, CAgon. Time-series data mining. ACM Computing Surveys, 2012, 45(1): 12
https://doi.org/10.1145/2379776.2379788
|
5 |
ACamerra, T Palpanas, JShieh, EKeogh. iSAX 2.0: indexing and mining one billion time series. In: Proceedings of the 10th IEEE International Conference on Data Mining. 2010, 58–67
https://doi.org/10.1109/ICDM.2010.124
|
6 |
JYang, JWidom. Incremental computation and maintenance of temporal aggregates. The VLDB Journal — The International Journal on Very Large Data Bases, 2003, 12(3): 262–283
|
7 |
JJin, NAn, ASivasubramaniam. Analyzing range queries on spatial data. In: Proceedings of the 16th IEEE International Conference on Data Engineering. 2000, 525–534
https://doi.org/10.1109/ICDE.2000.839451
|
8 |
AMueen, H Hamooni, TEstrada. Time series join on subsequence correlation. In: Proceedings of the 2014 IEEE International Conference on Data Mining. 2014, 450–459
https://doi.org/10.1109/ICDM.2014.52
|
9 |
Y HLi, U LHou, M LYiu, Z G Gong. Discovering longest-lasting correlation in sequence databases. Proceedings of the VLDB Endowment, 2013, 6(14): 1666–1677
https://doi.org/10.14778/2556549.2556552
|
10 |
YWang, PWang, JPei, S Huang. A data-adaptive and dynamic segmentation index for whole matching on time series. Proceedings of the VLDB Endowment, 2013, 6(10): 793–804
https://doi.org/10.14778/2536206.2536208
|
11 |
JJeffrey, M PJeff, F FLi, M W Tang. Ranking large temporal data. Proceedings of the VLDB Endowment, 2012, 5(11): 1412–1423
https://doi.org/10.14778/2350229.2350257
|
12 |
W MLuo, H YTan, LChen, l M Lione. Finding time period-based most frequent path in big trajectory data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 713–724
https://doi.org/10.1145/2463676.2465287
|
13 |
RAgrawal, C Faloutsos, ASwami. Efficient similarity search in sequence databases. In: Proceedings of the International Conference on Foundations of Data Organization and Algorithms. 1993, 69–84
https://doi.org/10.1007/3-540-57301-1_5
|
14 |
K PChan, W CFu. Efficient time series matching by wavelets. In: Proceedings of the IEEE International Conference on Data Engineering. 1999, 126–133
|
15 |
EKeogh, K Chakrabarti, MPazzani, SMehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 2002, 27(2): 188–228
https://doi.org/10.1145/568518.568520
|
16 |
ACamerra, JShieh, TPalpanas, T Rakthanmanon, EKeogh. Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. Knowledge & Information Systems, 2014, 39(1):123–151
https://doi.org/10.1007/s10115-012-0606-6
|
17 |
CFaloutsos, M Ranganathan, YManolopoulos. Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. 1994, 419–429
https://doi.org/10.1145/191839.191925
|
18 |
ESoroush, M Balazinska, DWang. ArrayStore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 253–264
https://doi.org/10.1145/1989323.1989351
|
19 |
SDas, Y Sismanis, K SBeyer. Ricardo: integrating R and Hadoop. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 987–998
https://doi.org/10.1145/1807167.1807275
|
20 |
BHuang, SBabu, JYang. Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 1–12
https://doi.org/10.1145/2463676.2465273
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|