Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2020, Vol. 14 Issue (6) : 146504    https://doi.org/10.1007/s11704-019-8437-1
RESEARCH ARTICLE
Guaranteeing the response deadline for general aggregation trees
Jiangfan LI, Chendie YAO, Junxu XIA, Deke GUO()
Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China
 Download: PDF(525 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

It is essential to provide responses to queries within time deadlines, even if not exact and complete. To reduce the query latency, systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees. An obstacle is that the involved processes of a query usually differ in their speeds, thus not all processes can complete their tasks in time. This would directly degrade the response quality (the number of outputs received by the root of an aggregation tree). In this paper, we propose a general aggregation tree model, Tarot, to maximize the response quality by systematically addressing the following challenging issues: (1) fine-grained partition of the query deadline along the multi-level aggregation tree; (2) learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators; (3) adaptively reassigning tasks over processes according to their status; (4) performing periodic aggregation of received outputs from the low level to avoid missing the deadline. The prior model does not consider the four aspects simultaneously. Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.

Keywords aggregation query      performance variations      tasks reassignment     
Corresponding Author(s): Deke GUO   
Just Accepted Date: 11 September 2019   Issue Date: 26 May 2020
 Cite this article:   
Jiangfan LI,Chendie YAO,Junxu XIA, et al. Guaranteeing the response deadline for general aggregation trees[J]. Front. Comput. Sci., 2020, 14(6): 146504.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8437-1
https://academic.hep.com.cn/fcs/EN/Y2020/V14/I6/146504
1 D Guo, J Xie, X Zhou, X Zhu, W Wei, X Luo. Exploiting efficient and scalable shuffle transfers in future data center networks. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(4): 997–1009
https://doi.org/10.1109/TPDS.2014.2316829
2 Y Yuan, G Wang, L Chen, H Wang. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2767–2779
https://doi.org/10.1109/TKDE.2012.222
3 Y Yuan, G Wang, L Chen, H Wang. Graph similarity search on large uncertain graph databases. The International Journal on Very Large Data Bases, 2015, 24(2): 271–296
https://doi.org/10.1007/s00778-014-0373-y
4 S Agarwal, A P Iyer, A Panda, S Madden, B Mozafari, I Stoica. Blink and it’s done: interactive queries on very large data. Proceedings of the VLDB Endowment, 2012, 5(12): 1902–1905
https://doi.org/10.14778/2367502.2367533
5 T Abe, T Ueda, K Abe, H Ishibashi, T Matsuura. Aggregation skip graph: a skip graph extension for efficient aggregation query over P2P networks. International Journal on Advances in Internet Technology, 2012, 4(3–4): 103–110
6 G Ananthanarayanan, M C Hung, X Ren, I Stoica, A Wierman, M Yu. GRASS: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation. 2014, 289–302
7 Z Ding, D Guo, X Liu, X Luo, G Chen. A mapreduce-supported network structure for data centers. Concurrency and Computation: Practice and Experience, 2012, 24(12): 1271–1295
https://doi.org/10.1002/cpe.1791
8 A I Naimi, W Daniel. Big data: a revolution that will transform how we live, work, and think. American Journal of Epidemiology. 2014, 179(9): 1143–1144
https://doi.org/10.1093/aje/kwu085
9 Y Yuan, G Wang, X J Yu, L Chen. Efficient distributed subgraph similarity matching. The International Journal on Very Large Data Bases, 2015, 24: 369–394
https://doi.org/10.1007/s00778-015-0381-6
10 G Kumar, G Ananthanarayanan, S Ratnasamy, I Stoica. Hold ’em or fold ’em?: aggregation queries under performance variations. In: Proceedings of the 11th European Conference on Computer Systems. 2016
https://doi.org/10.1145/2901318.2901351
11 J Dean, L A Barroso. The tail at scale. Communications of the ACM, 2013, 56(2): 74–80
https://doi.org/10.1145/2408776.2408794
12 D Guo, M Li. Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2367–2380
https://doi.org/10.1109/TKDE.2012.215
13 H A David. Order Statistics; 3rd ed. USA: Wiley, 2003
https://doi.org/10.1002/0471722162
14 D Guo, J Wu, Y Liu, H Jin, H Chen, T Chen. Quasi-kautz digraphs for peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(6): 1042–1055
https://doi.org/10.1109/TPDS.2010.161
15 L Luo, D Guo, R T B Ma, O Rottenstreich, X Luo. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys and Tutorials, 2019, 21(2): 1912–1949
https://doi.org/10.1109/COMST.2018.2889329
16 J Dean, S Ghemawat. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004
17 M Zaharia, A Konwinski, A D Joseph, R Katz, I Stoica. Improving mapreduce performance in heterogeneous environments. In: Proceedings of USENIX Conference on Operating Systems Design and Implementation. 2008, 29–42
18 K Shvachko, H Kuang, S Radia, R Chansler. The hadoop distributed file system. In: Proceedings of IEEE Symposium onMass Storage Systems and Technologies. 2010, 1–10
https://doi.org/10.1109/MSST.2010.5496972
19 K Asanovic, R Bodík, J Demmel, T Keaveny, K Keutzer, J Kubiatowicz, N Morgan, D Patterson, K Sen, J Wawrzynek, D Wessel, K A Yelick. A view of the parallel computing landscape. Communications of the ACM, 2009, 52(10): 56–67
https://doi.org/10.1145/1562764.1562783
20 Z Ding, D Guo, L Xue, X Luo, G Chen. A mapreduce-supported network structure for data centers. Concurrency and Computation Practice and Experience, 2012, 24(12): 1271–1295
https://doi.org/10.1002/cpe.1791
21 Y Yuan, X Lian, L Chen, Y Sun, G Wang. RSkNN: kNN search on road networks by incorporating social influence. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1575–1588
https://doi.org/10.1109/TKDE.2016.2518692
22 S Liao, L Chen, J Li, W Xiong, Q Wu. A spatiotemporal aggregation query method using multi-thread parallel technique based on regional division. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015, 2(4): 1
https://doi.org/10.5194/isprsannals-II-4-W2-1-2015
23 Y Tao, G Kollios, J Considine, F Li, D Papadias. Spatio-temporal aggregation using sketches. In: Proceedings of International Conference on Data Engineering. 2004, 214–225
24 Z Zhang, J Hui, X Xie, H Pan, X Feng. An online approximate aggregation query processing method based on hadoop. In: Proceedings of International Conference on Computer Supported Cooperative Work in Design. 2016, 117–122
https://doi.org/10.1109/CSCWD.2016.7565974
25 Y Yuan, X Lian, L Chen, J Yu, G Wang, Y Sun. Keyword search over distributed graphs. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1212–1225
https://doi.org/10.1109/TKDE.2017.2656079
26 D Zhang, C Y Chan, K L Tan. Processing spatial keyword query as a top-k aggregation query. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 355–364
https://doi.org/10.1145/2600428.2609562
27 A Rogge-Solti, M Weske. Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Proceedings of International Conference on Service-Oriented Computing. 2013, 389–403
https://doi.org/10.1007/978-3-642-45005-1_27
28 B Alinia, M H Hajiesmaili, A Khonsari, N Crespi. Maximum-quality tree construction for deadline-constrained aggregation in WSNs. IEEE Sensors Journal, 2017, 17(12): 3930–3943
https://doi.org/10.1109/JSEN.2017.2701552
29 Y Xu, Z Musgrave, B Noble, M Bailey. Bobtail: avoiding long tails in the cloud. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation. 2013, 329–342
30 M Alizadeh, A G Greenberg, D A Maltz, J Padhye, P Patel, B Prabhakar, S Sengupta, M Sridharan. Data center TCP (DCTCP). In: Proceedings of the ACM Special Interest Group on Data Communication. 2010, 63–74
https://doi.org/10.1145/1851275.1851192
31 G Ananthanarayanan, A Ghodsi, A Warfield, D Borthakur, S Kandula, S Shenker, I Stoica. Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. 2012, 267–280
32 M Isard, V Prabhakaran, J Currey, U Wieder, K Talwar, A Goldberg. Quincy: fair scheduling for distributed computing clusters. In: Proceeds of IEEE International Conference on Recent Trends in Information Systems. 2009, 261–276
https://doi.org/10.1145/1629575.1629601
33 S Kavulya, J Tan, R Gandhi, P Narasimhan. An analysis of traces from a production mapreduce cluster. In: Proceedings of IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 94–103
https://doi.org/10.1109/CCGRID.2010.112
34 C Wilson, H Ballani, T Karagiannis, A I T Rowstron. Better never than late: meeting deadlines in datacenter networks. In: Proceedings of the ACM Special Interest Group on Data Communication. 2011, 50–61
https://doi.org/10.1145/2043164.2018443
35 W Xiao, W Bao, X Zhu, L Liu. Cost-aware big data processing across geo-distributed datacenters. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(11): 3114–3127
https://doi.org/10.1109/TPDS.2017.2708120
36 G Tang, K Wu, R Brunner. Rethinking cdn design with distributed time-varying traffic demands. In: Proceedings of International Conference on Computer Communications. 2017, 1–9
https://doi.org/10.1109/INFOCOM.2017.8057028
37 G Tang, H Wang, K Wu, D Guo. Tapping the knowledge of dynamic traffic demands for optimal CDN design. IEEE/ACM Transactions on Networking, 2019, 27(1): 98–111
https://doi.org/10.1109/TNET.2018.2881169
38 S Melnik, A Gubarev, J J Long, G Romer, S Shivakumar, M Tolton, T Vassilakis. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 330–339
https://doi.org/10.14778/1920841.1920886
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed