Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2019, Vol. 13 Issue (3) : 647-664    https://doi.org/10.1007/s11704-018-7234-6
RESEARCH ARTICLE
Distributed top-k similarity query on big trajectory streams
Zhigang ZHANG1, Xiaodong QI1, Yilin WANG1, Cheqing JIN1, Jiali MAO1,2(), Aoying ZHOU2
1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
2. Computer School, China West Normal University, Nanchong 637009, China
 Download: PDF(742 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Recently, big trajectory data streams are generated in distributed environmentswith the popularity of smartphones and other mobile devices. Distributed top-k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n · m), is huge when n or m is huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods.

Keywords top-k similarity query      trajectory stream      communication cost     
Corresponding Author(s): Jiali MAO   
Just Accepted Date: 28 May 2018   Online First Date: 12 November 2018    Issue Date: 24 April 2019
 Cite this article:   
Zhigang ZHANG,Xiaodong QI,Yilin WANG, et al. Distributed top-k similarity query on big trajectory streams[J]. Front. Comput. Sci., 2019, 13(3): 647-664.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-018-7234-6
https://academic.hep.com.cn/fcs/EN/Y2019/V13/I3/647
1 J PDai, JTeng, XBai, Z H Shen, DXuan. Mobile phone based drunk driving detection. In: Proceedings of the 4th International Conference on Pervasive Computing Technologies for Healthcare. 2010, 1–8
https://doi.org/10.4108/ICST.PERVASIVEHEALTH2010.8901
2 DZeinalipour Yazti, C Laoudias, CCosta, MVlachos, M I Andreou, DGunopulos. Crowdsourced trace similarity with smartphones. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6): 1240–1253
https://doi.org/10.1109/TKDE.2012.55
3 HDing, G Trajcevski, PScheuermann. Efficient similarity join of large sets of moving object trajectories. In: Proceedings of the 15th International Conference on Temporal Representaion and Reasoning. 2008, 79–87
https://doi.org/10.1109/TIME.2008.25
4 C YMa, HLu, L DShou, G Chen. KSQ: top-k similarity query on uncertain trajectories. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(9): 2049–2062
https://doi.org/10.1109/TKDE.2012.152
5 GSkoumas, D Skoutas, AVlachaki. Efficient identification and approximation of k-nearest moving neighbors. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2013, 264–273
https://doi.org/10.1145/2525314.2525351
6 DSacharidis, D Skoutas, GSkoumas. Continuous monitoring of nearest trajectories. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2014, 361–370
https://doi.org/10.1145/2666310.2666408
7 M YYeh, K LWu, P SYu, M S Chen. Leewave: level-wise distribution of wavelet coefficients for processing knn queries over distributed streams. Proceedings of the VLDB Endowment, 2008, 1(1): 586–597
https://doi.org/10.14778/1453856.1453921
8 C CHsu, P HKung, M YYeh, S D Lin, P BGibbons. Bandwidthefficient distributed k-nearest-neighbor search with dynamic time warping. In: Proceedings of the 2015 IEEE International Conference on Big Data. 2015, 551–560
https://doi.org/10.1109/BigData.2015.7363799
9 Z GZhang, Y LWang, J LMao, S J Qiao, C QJin, A YZhou. DTKST: distributed top-k similarity query on big trajectory streams. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 199–214
https://doi.org/10.1007/978-3-319-55753-3_13
10 CFaloutsos, M Ranganathan, YManolopoulos. Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM International Conference on Management of Data. 1994, 419–429
https://doi.org/10.1145/191839.191925
11 K V RKanth, D Agrawal, A KSingh. Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the 1998 ACM International Conference on Management of Data. 1998, 166–176
12 IPopivanov, R JMiller. Similarity search over time-series data using wavelets. In: Proceedings of the 18th International Conference on Data Engineering. 2002, 212–221
https://doi.org/10.1109/ICDE.2002.994711
13 B KYi, C Faloutsos. Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th International Conference on Very Large Data Bases. 2000, 385–394
14 KChakrabarti, EKeogh, SMehrotra, M Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 2002, 27(2): 188–228
https://doi.org/10.1145/568518.568520
15 HCao, O Wolfson, GTrajcevski. Spatio-temporal data reduction with deterministic error bounds. The VLDB Journal, 2006, 15(3): 211–228
https://doi.org/10.1007/s00778-005-0163-7
16 A NPapadopoulos, Y Manolopoulos. Distributed processing of similarity queries. Distributed and Parallel Databases, 2001, 9(1): 67–92
https://doi.org/10.1023/A:1026509108054
17 SKashyap, PKarras. Scalable KNN search on vertically stored time series. In: Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 2011, 1334–1342
https://doi.org/10.1145/2020408.2020607
18 RVernica, M JCarey, CLi. Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 16th ACM International Conference on Management of Data. 2010, 495–506
https://doi.org/10.1145/1807167.1807222
19 YKim, KShim. Parallel top-k similarity join algorithms using mapreduce. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 510–521
https://doi.org/10.1109/ICDE.2012.87
20 D ZYazti, SLin, DGunopulos. Distributed spatio-temporal similarity search. In: Proceedings of the 2006 ACM International Conference on Information and Knowledge Management. 2006, 14–23
21 CCosta, C Laoudias, D ZYazti, DGunopulos. Smarttrace: finding similar trajectories in smartphone networks without disclosing the traces. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 1288–1291
https://doi.org/10.1109/ICDE.2011.5767934
22 K PChan, A W CFu, C TYu. Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 686–705
https://doi.org/10.1109/TKDE.2003.1198399
23 H PLiu, C QJin, A YZhou. Popular route planning with travel cost estimation. In: Proceedings of the 21st International Conference on Database Systems for Advanced Applications. 2016, 403–418
https://doi.org/10.1007/978-3-319-32049-6_25
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed